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Product and Mfifhnri 

i The present invention relates to oligonucleotide 

probes, for use in assessing gene transcript levels in a 
cell, which may be used in analytical techniques, 
particularly diagnostic techniques. Conveniently the 
probes are provided in kit form. Different sets of 
probes may be used in techniques to prepare gene 
expression patterns and identify, diagnose or monitor 
different states, such as diseases, conditions or stages 
thereof. Also provided are methods of identifying 
suitable probes and their use in methods of the 
invention. 

The identification of quick and easy methods of 
sample analysis for, for example, diagnostic 
applications, remains the goal of many researchers. End 
users seek methods which are cost effective, produce 
statistically significant results and which may be 
implemented routinely without the need for highly 
skilled individuals . 

The analysis of gene expression within cells has 
been used to provide information on the state of those 
cells and importantly the state of the individual from 
which the cells are derived. The relative expression of 
various genes in a cell has been identified as 
reflecting a particular state within a body. For 
example, cancer cells are known to exhibit altered 
expression of various proteins and the transcripts or 
the expressed proteins may therefore be used as markers 
of that disease state. 

Thus biopsy tissue may be analysed for the presence 
of these markers and cells originating from the site of 
the disease may be identified in other tissues or fluids 
of the body by the presence of the markers . 
Furthermore, products of the altered expression may be 
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released into the blood stream and these products may be 
analysed. In addition cells which have contacted 
disease cells may be affected by their direct contact 
with those cells resulting in altered gene expression 
5 and their expression or products of expression may be 
similarly analysed. 

However, there are some .limitations with these 
methods. For example, the use of specific tumour 
markers for identifying cancer suffers from a variety of 
10 defects, such as lack of specificity or sensitivity, 
association of the marker with disease states besides 
the specific type of cancer, and difficulty of detection 
in asymptomatic individuals. 

In addition to the analysis of one or two marker 
15 transcripts or proteins, more recently, gene expression 
patterns have been analysed. Most of the work involving 
large-scale gene expression analysis with implications 
in disease diagnosis has involved clinical samples 
originating from diseased tissues or cells. For 
2 0 example, several recent publications, which demonstrate 
that gene expression data can be used to distinguish 
between similar cancer types, have used clinical samples 
from diseased tissues or cells (Alon et al . 1999, PNAS, 
96, p6745-6750; Golub et al . 1999, Science, 286, p531- 
25 537; Alizadeh et al, 2000, Nature, 403, p503-511; 
Bittner et al . , 2000, Nature, 406, p536-540) . 

However, these methods have relied on analysis of a 
sample containing diseased cells or products of those 
cells or cells which have been contacted by disease 
30 cells. Analysis of such samples relies on knowledge of 
the presence of a disease and its location, which may be 
difficult in asymptomatic patients. Furthermore, 
samples can not always be taken from the disease site, 
e.g. in diseases of the brain. 
35 In a finding of great significance, the present 

inventors identified the previously untapped potential 
of all cells within a body to provide information 
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relating to the state of the organism from which the 
cells were derived. W098/49342 describes the analysis 
of the gene expression of cells distant from the site of 
disease, e.g. peripheral blood collected distant from a 
5 cancer site. 

This finding is based on the premise that the 
different parts of an organism's body exist in dynamic 
interaction with each other. When a disease affects one 
part of the body, other parts of the body are also 

10 affected. The interaction results from a wide spectrum 
of biochemical signals that are released from the 
diseased area, affecting other areas in the body. 
Although, the nature of the biochemical and 
physiological changes induced by the released signals 

15 can vary in the different body parts, the changes can be 
measured at the level of gene expression and used for 

diagnostic purposes. * 

f 1 
The physiological state of a cell in an organism is 

determined by the pattern with which genes are expressed 

20 in it. The pattern depends upon the internal and 
external biological stimuli to which said cell is 
exposed, and any change either in the extent or in the 
nature of these stimuli can lead to a change in the 
pattern with which the different genes are expressed in 

25 the cell. There is a growing understanding that by 
analysing the systemic changes in gene expression 
patterns in cells in biological samples, it is possible 
to provide information on the type and nature of the 
biological stimuli that are acting on them. Thus, for 

3 0 example, by monitoring the expression of a large number 
of genes in cells in a test sample, it is possible to 
determine whether their genes are expressed with a 
pattern characteristic for a particular disease, 
condition or stage thereof. Measuring changes in gene 

35 activities in cells, e.g. from tissue or body fluids is 
therefore emerging as a powerful tool for disease 
diagnosis. 
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Such methods have various advantages. Often, 
obtaining clinical samples from certain areas in the 
body that is diseased can be difficult and may involve 
undesirable invasions in the body, for example biopsy is 
5 often used to obtain samples for cancer. In some cases, 
such as in Alzheimer's disease the diseased brain 
specimen can only be obtained post-mortem. Furthermore, 
the tissue specimens which are obtained are often 
heterogeneous and may contain a mixture of both diseased 

10 and non-diseased cells, making the analysis of generated 
gene expression data both complex and difficult. 

It has been suggested that a pool of tumour tissues 
that appear to be pathogenetically homogeneous with 
respect to morphological appearances of the tumour may 

15 well be highly heterogeneous at the molecular level 
(Alizadeh, 2000, supra) , and in fact might contain 
tumours representing essentially different diseases 
(Alizadeh, 2 000, supra; Golub, 1999, supra) . For the 
purpose of identifying a disease, condition, or a stage 

2 0 thereof, any method that does not require clinical 

samples to originate directly from diseased tissues or 
cells is highly desirable since clinical samples 
representing a homogeneous mixture of cell types can be 
obtained from an easily accessible region in the body. 

25 We have now identified a set of probes of 

surprising utility for identifying one or more diseases. 
Thus, we now describe probes and sets of probes derived 
from cells which are not disease cells and which have 
not contacted disease cells, which correspond to genes 

30 which exhibit altered expression in normal versus 

disease individuals, for use in methods of identifying, 
diagnosing or monitoring certain conditions, 
particularly diseases or stages thereof . 

Thus the invention provides a set of 

3 5 oligonucleotide probes which correspond to genes in a 

cell whose expression is affected in a pattern 
characteristic of a particular disease, condition or 
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stage thereof, wherein said genes are systemically 
affected by said disease, condition or stage thereof. 
Preferably said genes are metabolic or house-keeping 
genes and preferably are constitutively moderately or 
highly expressed. Preferably the genes are moderately 
or highly expressed in the cells of the sample but not 
in cells from disease cells or in cells having contacted 
such disease cells. 

Such probes, particularly when isolated from cells 
distant to the site of disease, do not rely on the 
development of disease to clinically recognizable levels 
and allow detection of a disease or condition or stage 
thereof very early after the onset of said disease or 
condition, even years before other subjective or 
15 objective symptoms appear. 

As used herein "systemically" affected genes refers 
to genes whose expression is affected in the body 
without direct contact with a disease cell or disease 
site and the cells under investigation are not disease 
20 cells. 

"Contact" as referred to herein refers to cells 
coming into close proximity with one another such that 
the direct effect of one cell on the other may be 
observed, e.g. an immune response, wherein these 
25 responses are not mediated by secondary molecules 

released from the first cell over a large distance to 
affect the second cell. Preferably contact refers to 
physical contact, or contact that is as close as is 
sterically possible, conveniently, cells which contact 
3 0 one another are found in the same unit volume, for 
example within lcm 3 . 

A "disease cell" is a cell manifesting phenotypic 
changes and is present at the disease site at some time 
during its life-span, e.g. a tumour cell at the tumour 
35 site or which has disseminated from the tumour, or a 
brain cell in the case of brain disorders such as 
Alzheimer's disease. 
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"Metabolic" or "house-keeping" genes refer to those 
genes responsible for expressing products involved in 
cell division and maintenance, e.g. non-immune function 
related genes . 

5 "Moderately or highly" expressed genes refers to 

those present in resting cells in a copy number of more 
than 30-100 copies/cell (assuming an average 3xl0 5 mRNA 
molecules in a cell) . 

Specific probes having the above described 
10 properties are provided herein. 

Thus in one aspect, the present invention provides 
a set of oligonucleotide probes, wherein said set 
comprises at least 10 oligonucleotides selected from: 
an oligonucleotide as described in Table 1 or 
I 5 derived from a sequence described in Table 1, or an 

oligonucleotide with a complementary sequence, 
or a functionally equivalent oligonucleotide. 
"Table 1" as referred to herein refers to Table la 
and/or Table lb. Table lb contains reference to 
20 additional clones and sequences as disclosed herein. 
Similarly Tables 2 and 4 comprise 2 parts, a and b. 

The invention also provides one or more 
oligonucleotide probes, wherein each oligonucleotide 
probe is selected from the oligonucleotides listed in 
25 Table 1, ,or derived from a sequence described in Table 

1, or a complementary sequence thereof. The use of such 
probes in products and methods of the invention, form 
further aspects of the invention. 

As referred to herein an "oligonucleotide" is a 
3 0 nucleic acid molecule having at least 6 monomers in the 
polymeric structure, ie . nucleotides or modified forms 
thereof. The nucleic acid molecule may be DNA, RNA or 
PNA (peptide nucleic acid) or hybrids thereof or 
modified versions thereof, e.g. chemically modified 
35 forms, e.g. LNA (Locked Nucleic acid) , by methylation or 
made up of modified or non-natural bases during 
synthesis, providing they retain their ability to bind 
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to complementary sequences. Such oligonucleotides are 
used in accordance with the invention to probe target 
sequences and are thus referred to herein also as 
oligonucleotide probes or simply as probes. 
5 An "oligonucleotide derived from a sequence 

described in Table 1" (or any other table) refers to a 
part of a sequence disclosed in that Table (e.g. Table 
1-4) , which satisfies the requirements of the 
oligonucleotide probes as described herein, e.g. in 
10 length and function. Preferably said parts have the size 
described hereinafter. 

Preferably the oligonucleotide probes forming said 
set are at least 15 bases in length to allow binding of 
target molecules. Especially preferably said 
15 oligonucleotide probes are from 2 0 to 2 00 bases in 

length, e.g. from 30 to 150 bases, preferably 50-100 
bases in length. 

As referred to herein the term "complementary 
sequences" refers to sequences with consecutive 
complementary bases (ie. T:A, G:C) and which 
complementary sequences are therefore able to bind to 
one another through their complementarity. 

Reference to "10 oligonucleotides" refers to 10 
different oligonucleotides. Whilst a Table 1 
25 oligonucleotide, a Table 1 derived oligonucleotide and 
their functional equivalent are considered different 
oligonucleotides, complementary oligonucleotides are not 
considered different. Preferably however, the at least 
10 oligonucleotides are 10 different Table 1 
3 0 oligonucleotides (or Table 1 derived oligonucleotides or 
their functional equivalents) . Thus said 10 different 
oligonucleotides are preferably able to bind to 10 
different transcripts . 

Preferably said oligonucleotides are as described 
35 in Table 1 or are derived from a sequence described in 
Table 1. Especially preferably said oligonucleotides 
are as described in Table 2 or Table 4 or are derived 
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from a sequence described in either of those tables. 
Especially preferably the oligonucleotide (or the 
oligonucleotide derived therefrom) has a high occurrence 
as defined in Table 3, especially preferably >4 0%, e.g. 
>80 or >90, e.g. 100%. 

A "set" as described refers to a collection of 
unique oligonucleotide probes (ie. having a distinct 
sequence) and preferably consists of less than 1000 
oligonucleotide probes, especially less than 500 probes, 
e.g. preferably from 10 to 500, e.g. 10 to 100, 200 or 
300, especially preferably 20 to 100, e.g. 30 to 100 
probes. in some cases less than 10 probes may be used, 
e.g. from 2 to 9 probes, e.g. 5 to 9 probes. 

It will be appreciated that increasing the number 
of probes will prevent the possibility of poor analysis, 
e.g. misdiagnosis by comparison to other diseases which 
could similarly alter the expression of the particular 
genes in question. Other oligonucleotide probes not 
described herein may also be present, particularly if 
they aid the ultimate use of the set of oligonucleotide 
probes. However, preferably said set consists only of 
said Table 1 oligonucleotides, Table 1 derived 
oligonucleotides, complementary sequences or 
functionally equivalent oligonucleotides, or a sub-set 
25 thereof (e.g. of the size as described above), 

preferably a sub- set for which sequences are provided 
herein (see Table 1 and its footnote). Especially 
preferably said set consists only of said Table 1 
oligonucleotides, Table 1 derived oligonucleotides, or 
complementary sequences thereof, or a sub- set thereof. 

Multiple copies of each unique oligonucleotide 
probe, e.g. 10 or more copies, may be present in each 
set, but constitute only a single probe. 

A set of oligonucleotide probes, which may 
35 preferably be immobilized on a solid support or have 

means for such immobilization, comprises the at least 10 
oligonucleotide probes selected from those described 
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hereinbefore. Especially preferably said probes are 
selected from those having high occurrence as described 
in Table 3 and as mentioned above. As mentioned above, 
these 10 probes must be unique and have different 
5 sequences. Having said this however, two separate 

probes may be used which recognize the same gene but 
reflect different splicing events. However 
oligonucleotide probes which are complementary to, and 
bind to distinct genes are preferred. 
10 As described herein a "functionally equivalent" 

oligonucleotide to those described in Table l or derived 
therefrom refers to an oligonucleotide which is capable 
of identifying the same gene as an oligonucleotide of 
Table 1 or derived therefrom, ie. it can bind to the 
15 same mRNA molecule (or DMA) transcribed from a gene 
(target nucleic acid molecule) as the Table 1 
oligonucleotide or the Table 1 derived oligonucleotide 
(or its complementary sequence) . Preferably said 
functionally equivalent oligonucleotide is capable of 
20 recognizing, ie. binding to the same splicing product as 
a Table 1 oligonucleotide or a Table 1 derived 
oligonucleotide. Preferably said mRNA molecule is the 
full length mRNA molecule which corresponds to the Table 
1 oligonucleotide or the Table 1 derived 
25 oligonucleotide. 

As referred to herein "capable of binding" or 
"binding" refers to the ability to hybridize under 
conditions described hereinafter. 

Alternatively expressed, functionally equivalent 
3 0 oligonucleotides (or complementary sequences) have 
sequence identity or will hybridize, as described 
hereinafter, to a region of the target molecule to which 
molecule a Table 1 oligonucleotide or a Table 1 derived 
oligonucleotide or a complementary oligonucleotide 
35 binds. Preferably, functionally equivalent 

oligonucleotides (or their complementary sequences) 
hybridize to one of the mRNA sequences which corresponds 
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to a Table 1 oligonucleotide or a Table 1 derived 
oligonucleotide under the conditions described 
hereinafter or has sequence identity to a part of one of 
the mRNA sequences which corresponds to a Table 1 
5 oligonucleotide or a Table 1 derived oligonucleotide. A 
"part" in this context refers to a stretch of at least 
5, e.g. at least 10 or 2 0 bases, such as from 5 to 100, 
e.g. 10 to 50 or 15 to 30 bases. 

In a particularly preferred aspect, the^ 
10 functionally equivalent oligonucleotide binds to all or 
a part of the region of a target nucleic acid molecule 
(mRNA or cDNA) to which the Table 1 oligonucleotide or 
Table 1 derived oligonucleotide binds. A "target 11 
nucleic acid molecule is the gene transcript or related 
15 product e.g. mRNA, or cDNA, or amplified product 

thereof. Said "region" of said target molecule to which 
said Table 1 oligonucleotide or Table 1 derived 
oligonucleotide binds is the stretch over which 
complementarity exists. At its largest this region is 
20 the whole length of the Table 1 oligonucleotide or Table 
1 derived oligonucleotide, but may be shorter if the 
entire Table 1 sequence or Table 1 derived 
oligonucleotide is not complementary to a region of the 
target sequence . 
25 Preferably said part of said region of said target 

molecule is a stretch of at least 5, e.g. at least 10 or 
20 bases, such as from 5 to 100, e.g. 10 to 50 or 15 to 
30 bases. This may for example be achieved by said, 
functionally equivalent oligonucleotide having several 
3 0 identical bases to the bases of the Table 1 

oligonucleotide or the Table 1 derived oligonucleotide. 
These bases may be identical over consecutive stretches, 
e.g. in a part of the functionally equivalent 
oligonucleotide, or may be present non-consecutively , 
3 5 but provide sufficient complementarity to allow binding 
to the target sequence. 

Thus in a preferred feature, said functionally 
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equivalent oligonucleotide hybridizes under conditions 
of high stringency to a Table 1 oligonucleotide or a 
Table 1 derived oligonucleotide or the complementary 
sequence thereof. Alternatively expressed, said 
functionally equivalent oligonucleotide exhibits high 
sequence identity to all or part of a Table 1 
oligonucleotide. Preferably said functionally 
equivalent oligonucleotide has at least 70% sequence 
identity, preferably at least 80%, e.g. at least 90, 95, 
98 or 99%, to all of a Table 1 oligonucleotide or a part 
thereof. As used in this context, a "part" refers to a 
stretch of at least 5, e.g. at least 10 or 20 bases, 
such as from 5 to 100, e.g. 10 to 50 or 15 to 3 0 bases, 
in said Table 1 oligonucleotide. Especially preferably 
15 when sequence identity to only a part of said Table 1 
oligonucleotide is present, the sequence identity is 
high, e.g. at least 80% as described above. 

Functionally equivalent oligonucleotides which 
satisfy the above stated functional requirements include 
20 those which are derived from the Table 1 

oligonucleotides and also those which have been modified 
by single or multiple nucleotide base (or equivalent) 
substitution, addition and/or deletion, but which 
nonetheless retain functional activity, e.g. bind to the 
same target molecule as the Table 1 oligonucleotide or 
the Table 1 derived oligonucleotide from which they are 
further derived or modified. Preferably said 
modification is of from 1 to 50, e.g. from 10 to 30, 
preferably from 1 to 5 bases. Especially preferably 
only minor modifications are present, e.g. variations in 
less than 10 bases, e.g. less than 5 base changes. 

Within the meaning of "addition" equivalents are 
included oligonucleotides containing additional 
sequences which are complementary to the consecutive 
35 stretch of bases on the target molecule to which the 
Table 1 oligonucleotide or the Table 1 derived 
oligonucleotide binds. Alternatively the addition may 
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comprise a different, unrelated sequence, which may for 
example confer a further property, e.g. to provide a 
means for immobilization such as a linker to bind the 
oligonucleotide probe to a solid support. 
5 Particularly preferred are naturally occurring 

equivalents such as biological variants, e.g. allelic, 
geographical or allotypic variants, e.g. 

oligonucleotides which correspond to a genetic variant, 
for example as present in a different species. 
10 Functional equivalents include oligonucleotides 

with modified bases, e.g. using non-naturally occurring 
bases. Such derivatives may be prepared during 
synthesis or by post production modification. 

"Hybridizing" sequences which bind under conditions 
15 of low stringency are those which bind under non- 
stringent conditions (for example, 6x SSC/50% formamide 
at room temperature) and remain bound when washed under 
conditions of low stringency (2 X SSC, room temperature, 
more preferably 2 X SSC, 42°C) . Hybridizing under high 
2 0 stringency refers to the above conditions in which 

washing is performed at 2 X SSC, 65°C (where SSC = 0 . 15M 
NaCl, 0.015M sodium citrate, pH 7.2). 

"Sequence identity" as referred to herein refers to 
the value obtained when assessed using ClustalW 
25 (Thompson et al . , 1994, Nucl . Acids Res., 22, p4673- 

4680) with the following parameters: 
Pairwise alignment parameters - Method: accurate, 
Matrix: IUB, Gap open penalty: 15.00, Gap extension 
penalty: 6.66; 

30 Multiple alignment parameters - Matrix: IUB, Gap open 
penalty: 15.00, % identity for delay: 30, Negative 
matrix: no, Gap extension penalty: 6.66, DNA transitions 
weighting: 0.5. 

Sequence identity at a particular base is intended 
35 to include identical bases which have simply been 
derivatized. 

The invention also extends" to polypeptides encoded 
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by the mRNA sequence to which a Table 1 oligonucleotide 
or a Table 1 derived oligonucleotide binds. The 
invention further extends to antibodies which bind to 
any of said polypeptides . 
5 As described above, conveniently said set of 

oligonucleotide probes may be immobilized on one or more 
solid supports. Single or preferably multiple copies of 
each unique probe are attached to said solid supports, 
e.g. 10 or more, e.g. at least 10 0 copies of each unique 
10 probe are present. 

One or more unique oligonucleotide probes may be 
associated with separate solid supports which together 
form a set of probes immobilized on multiple solid 
support, e.g. one or more unique probes may be 
15 immobilized on multiple beads, membranes, filters, 

biochips etc. which together form a set of probes, which 
together form modules of the kit described hereinafter. 
The solid support of the different modules are 
conveniently physically associated although the signals 
20 associated with each probe (generated as described 
hereinafter) must be separately determinable. 

Alternatively, the probes may be immobilized on 
discrete portions of the same solid support, e.g. each 
unique oligonucleotide probe, e.g. in multiple copies, 
25 may be immobilized to a distinct and discrete portion or 
region of a single filter or membrane, e.g. to generate 
an array. 

A combination of such techniques may also be used, 
e.g. several solid supports may be used which each 
30 immobilize several unique probes. 

The expression "solid support" shall mean any solid 
material able to bind oligonucleotides by hydrophobic, 
ionic or covalent bridges . 

"Immobilization" as used herein refers to 
35 reversible or irreversible association of the probes to 
said solid support by virtue of such binding. If 
reversible, the probes remain associated with the solid 
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support for a time sufficient for methods of the 
invention to be carried out. 

Numerous solid supports suitable as immobilizing 
moieties according to the invention, are well known in 
the art and widely described in the literature and 
generally speaking, the solid support may be any of the 
well-known supports or matrices which are currently 
widely used or proposed for immobilization, separation 
etc. in chemical or biochemical procedures. Such 
materials include, but are not limited to, any synthetic 
organic polymer such as polystyrene, polyvinylchloride , 
polyethylene; or nitrocellulose and cellulose acetate; 
or tosyl activated surfaces; or glass or nylon or any 
surface carrying a group suited for covalent coupling of 
15 nucleic acids. The immobilizing moieties may take the 
form of particles, sheets, gels, filters, membranes, 
microfibre strips, tubes or plates, fibres or 
capillaries, made for example of a polymeric material 
e.g. agarose, cellulose, alginate, teflon, latex or 
20 polystyrene or magnetic beads. Solid supports allowing 
the presentation of an array, preferably in a single 
dimension are preferred, e.g. sheets, filters, 
membranes, plates or biochips . 

Attachment of the nucleic acid molecules to the 
25 solid support may be performed directly or indirectly. 
For example if a filter is used, attachment may be 
performed by UV-induced crosslinking. Alternatively, 
attachment may be performed indirectly by the use of an 
attachment moiety carried on the oligonucleotide probes 
30 and/or solid support. Thus for example, a pair of 

affinity binding partners may be used, such as avidin, 
streptavidin or biotin, DNA or DNA binding protein (e.g. 
either the lac I repressor protein or the lac operator 
sequence to which it binds) , antibodies (which may be 
3 5 mono- or polyclonal) , antibody fragments or the epitopes 
or haptens of antibodies. In these cases, one partner 
of the binding pair is attached to (or is inherently 



WO 2004/046382 



PCT/GB2003/005102 



15 



part of) the solid support and the other partner is 
attached to (or is inherently part of) the nucleic acid 
molecules . 

As used herein an "affinity binding pair" refers to 
5 two components which recognize and bind to one another 
specifically (ie. in preference to binding to other 
molecules) . Such binding pairs when bound together form 
a complex. 

Attachment of appropriate functional groups to the 
10 solid support may be performed by methods well known in 
the art, which include for example, attachment through 
hydroxys carboxyl, aldehyde or amino groups which may 
be provided by treating the solid support to provide 
suitable surface coatings. Solid supports presenting 
15 appropriate moieties for attachment of the binding 

partner may be produced by routine methods known in the 
art . 

Attachment of appropriate functional groups to the 
oligonucleotide probes of the invention may be performed 
2 0 by ligation or introduced during synthesis or 

amplification, for example using primers carrying an 
appropriate moiety, such as biotin or a particular 
sequence for capture . 

Conveniently, the set of probes described 
25 hereinbefore is provided in kit form. 

Thus viewed from a further aspect the present 
invention provides a kit comprising a set of 
oligonucleotide probes as described hereinbefore 
immobilized on one or more solid supports. 
30 Preferably, said probes are immobilized on a single 

solid support and each unique probe is attached to a 
different region of said solid support. However, when 
attached to multiple solid supports, said multiple solid 
supports form the modules which make up the kit. 
35 Especially preferably said solid support is a sheet, 
filter, membrane, plate or biochip. 

Optionally the kit may also contain information 
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relating to the signals generated by normal or diseased 
samples (as discussed in more detail hereinafter in 
relation to the use of the kits) , standardizing 
materials, e.g. mRNA or cDNA from normal and/or diseased 
5 samples for comparative purposes, labels for 

incorporation into cDNA, adapters for introducing 
nucleic acid sequences for amplification purposes, 
primers for amplification and/or appropriate enzymes, 
buffers and solutions. Optionally said kit may also 
10 contain a package insert describing how the method of 

the invention should be performed, optionally providing 
standard graphs, data or software for interpretation of 
results obtained when performing the invention. 
The use of such kits to prepare a standard 
15 diagnostic gene transcript pattern as described 

hereinafter forms a further aspect of the invention. 

The set of probes as described herein have various 
uses. Principally however they are used to assess the 
gene expression state of a test cell to provide 
2 0 information relating to the organism from which said 
cell is derived. Thus the probes are useful in 
diagnosing, identifying or monitoring a disease or 
condition or stage thereof in an organism. 

Thus in a further aspect the invention provides the 
25 use of a set of oligonucleotide probes or a kit as 

described hereinbefore to determine the gene expression 
pattern of a cell which pattern reflects the level of 
gene expression of genes to which said oligonucleotide 
probes bind, comprising at least the steps of: 
30 a > isolating mRNA from said cell, which may 

optionally be reverse transcribed to cDNA; 

b) hybridizing the mRNA or cDNA of step (a) to a 
set of oligonucleotide probes or a kit as defined 
herein; and 

35 c ) assessing the amount of mRNA or cDNA hybridizing 

to each of said probes to produce said pattern. 

The mRNA and cDNA as referred to in this method, 
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and the methods hereinafter, encompass derivatives or 
copies of said molecules, e.g. copies of such molecules 
such as those produced by amplification or the 
preparation of complementary strands, but which retain 
the identity of the mRNA sequence, ie . would hybridize 
to the direct transcript (or its complementary sequence) 
by virtue of precise complementarity, or sequence 
identity, over at least a region of said molecule. It 
will be appreciated that complementarity will not exist 
over the entire region where techniques have been used 
which may truncate the transcript or introduce new 
sequences, e.g. by primer amplification. For 
convenience, said mRNA or cDNA is preferably amplified 
prior to step b) . As with the oligonucleotides 
15 described herein said molecules may be modified, e.g. by 
using non-natural bases during synthesis providing 
complementarity remains. Such molecules may also carry 
additional moieties such as signalling or immobilizing 
means . 

20 various steps involved in the method of 

preparing such a pattern are described in more detail 
hereinafter . 

As used herein "gene expression" refers to 
transcription of a particular gene to produce a specific 

25 mRNA product (ie. a particular splicing product) . The 
level of gene expression may be determined by assessing 
the level of transcribed mRNA molecules or cDNA 
molecules reverse transcribed from the mRNA molecules or 
products derived from those molecules, e.g. by 

30 amplification. 

The "pattern" created by this technique refers to 
information which, for example, may be represented in 
tabular or graphical form and conveys information about 
the signal associated with two or more oligonucleotides . 

35 Preferably said pattern is expressed as an array of 

numbers relating to the expression level associated with 
each probe . 
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Preferably, said pattern is established using the 
following linear model: 

y = + f Equation 1 

wherein, X is the matrix of gene expression data and y 
5 is the response variable, b is the regression 

coefficient vector and f the estimated residual vector. 
Although many different methods can be used to establish 
the relationship provided in equation 1, especially 
preferably the partial Least Squares Regression (PLSR) 
10 method is used for establishing the relationship in 
equation 1. 

The probes are thus used to generate a pattern 
which reflects the gene expression of a cell at the time 
of its isolation. The pattern of expression is 
15 characteristic of the circumstances under which that 
cells finds itself and depends on the influences to 
which the cell has been exposed. Thus, a characteristic 
gene transcript pattern standard or fingerprint 
(standard probe pattern) for cells from an individual 

2 0 with a particular disease or condition may be prepared 

and used for comparison to transcript patterns of test 
cells. This has clear applications in diagnosing, 
monitoring or identifying whether an organism is 
suffering from a particular disease, condition or stage 
25 thereof. 

The standard pattern is prepared by determining the 
extent of binding of total mRNA (or cDNA or related 
product), from cells from a sample of one or more 
organisms with the disease or condition or stage 

3 0 thereof, to the probes. This reflects the level of 

transcripts which are present which correspond to each 
unique probe. The amount of nucleic acid material which 
binds to the different probes is assessed and this 
information together forms the gene transcript pattern 
35 standard of that disease or condition or stage thereof. 
Each such standard pattern is characteristic of the 
disease, condition or stage thereof. 
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In a further aspect therefore, the present 
invention provides a method of preparing a standard gene 
transcript pattern characteristic of a disease or 
condition or stage thereof in an organism comprising at 
5 least the steps of : 

a) isolating mRNA from the cells of a sample of one 
or more organisms having the disease or condition or 
stage thereof, which may optionally be reverse 
transcribed to cDNA; 
10 b) hybridizing the mRNA or cDNA of step (a) to a 

set of oligonucleotides or a kit as described 
hereinbefore specific for said disease or condition or 
stage thereof in an organism and sample thereof 
corresponding to the organism and sample thereof under 
15 investigation; and 

c) assessing the amount of mRNA or cDNA hybridizing 
to each of said probes to produce a characteristic 
pattern reflecting the level of gene expression of genes 
to which said oligonucleotides bind, in the sample with 
20 the disease, condition or stage thereof. 

For convenience, said oligonucleotides are 
preferably immobilized on one or more solid supports. 

The standard pattern for a great number of diseases 
or conditions and different stages thereof using 
25 particular probes may be accumulated in databases and be 
made available to laboratories on request. 

"Disease" samples and organisms as referred to 
herein refer to organisms (or samples from the same) 
with an underlying pathological disturbance relative to 
a normal organism (or sample) , in a symptomatic or 
asymptomatic organism, which may result, for example, 
from infection or an acquired or congenital genetic 
imperfection. Such organisms are known to have, or 
which exhibit, the disease or condition or stage thereof 
35 under study. 

A "condition" refers to a state of the mind or body 
of an organism which has not occurred through disease, 
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e.g. the presence of an agent in the body such as a 
toxin, drug or pollutant, or pregnancy. 

"Stages" thereof refer to different stages of the 
disease or condition which may or may not exhibit 
5 particular physiological or metabolic changes, but do 
exhibit changes at the genetic level which may be 
detected as altered gene expression. it will be 
appreciated that during the course of a disease or 
condition the expression of different transcripts may 
10 vary. Thus at different stages, altered expression may 
not be exhibited for particular transcripts compared to 
"normal" samples. However, combining information from 
several transcripts which exhibit altered expression at 
one or more stages through the course of the disease or 
15 condition can be used to provide a characteristic 

pattern which is indicative of a particular stage of the 
disease or condition. Thus for example different stages 
in cancer, e.g. pre-stage I, stage I, stage II, II or IV 
can be identified. 
20 "Normal" as used herein refers to organisms or 

samples which are used for comparative purposes. 
Preferably, these are "normal" in the sense that they do 
not exhibit any indication of, or are not believed to 
have, any disease or condition that would affect gene 
25 expression, particularly in respect of the disease for 
which they are to be used as the normal standard. 
However, it will be appreciated that different stages of 
a disease or condition may be compared and in such 
cases, the "normal" sample may correspond to the earlier 
3 0 stage of the disease or condition. 

As used herein a "sample" refers to any material 
obtained from the organism, e.g. human or non-human 
animal under investigation which contains cells and 
includes, tissues, body fluid or body waste or in the 
35 case of prokaryotic organisms, the organism itself. 
"Body fluids" include blood, saliva, spinal fluid, 
semen, lymph. "Body waste" includes urine, expectorated 
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matter (pulmonary patients), faeces etc. "Tissue 
samples" include tissue obtained by biopsy, by surgical 
interventions or by other means e.g. placenta. 
Preferably however, the samples which are examined are 
5 from areas of the body not apparently affected by the 

disease or condition. The cells in such samples are not 
disease cells, e.g. cancer cells, have not been in 
contact with such disease cells and do not originate 
from the site of the disease or condition. The "site of 
10 disease" is considered to be that area of the body which 
manifests the disease in a way which may be objectively 
determined, e.g. a tumour or area of inflammation. Thus 
for example peripheral blood may be used for the 
diagnosis of non-haematopoietic cancers, and the blood 
15 does not require the presence of malignant or 

disseminated cells from the cancer in the blood. 
Similarly in diseases of the brain, in which no diseased 
cells are found in the blood due to the blood; brain 
barrier, peripheral blood may still be used in the 
20 methods of the invention. 

It will however be appreciated that the method of 
preparing the standard transcription pattern and other 
methods of the invention are also applicable for use on 
living parts of eukaryotic organisms such as cell lines 
25 and organ cultures and explants . 

As used herein, reference to "corresponding" sample 
etc. refers to cells preferably from the same tissue, 
body fluid or body waste, but also includes cells from 
tissue, body fluid or body waste which are sufficiently 
30 similar for the purposes of preparing the standard or 
test pattern. When used in reference to genes 
"corresponding" to the probes, this refers to genes 
which are related by sequence (which may be 
complementary) to the probes although the probes may 
3 5 reflect different splicing products of expression. 

"Assessing" as used herein refers to both 
quantitative and qualitative assessment which may be 
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determined in absolute or relative terms. 

The invention may be put into practice as follows. 
To prepare a standard transcript pattern for a 
particular disease, condition or stage thereof, sample 
5 mRNA is extracted from the cells of tissues, body fluid 
or body waste according to known techniques (see for 
example Sambrook et . al. (1989), Molecular Cloning : A 
laboratory manual, 2nd Ed., Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, N.Y.) from a 
10 diseased individual or organism. 

Owing to the difficulties in working with RNA, the 
RNA is preferably reverse transcribed at this stage to 
form first strand cDNA. Cloning of the cDNA or 
selection from, or using, a cDNA library is not however 
necessary in this or other methods of the invention. 
Preferably, the complementary strands of the first 
strand cDNAs are synthesized, ie. second strand cDNAs , 
but this will depend on which relative strands are 
present in the oligonucleotide probes. The RNA may 
however alternatively be used directly without reverse 
transcription and may be labelled if so required. 

Preferably the cDNA strands are amplified by known 
amplification techniques such as the polymerase chain 
reaction (PCR) by the use of appropriate primers. 
Alternatively, the cDNA strands may be cloned with a 
vector, used to transform a bacteria such as E. coli 
which may then be grown to multiply the nucleic acid 
molecules. When the sequence of the cDNAs are not 
known, primers may be directed to regions of the nucleic 
3 0 acid molecules which have been introduced. Thus for 

example, adapters may be ligated to the cDNA molecules 
and primers directed to these portions for amplification 
of the cDNA molecules. Alternatively, in the case of 
eukaryotic samples, advantage may be taken . of the polyA 
tail and cap of the RNA to prepare appropriate primers. 

To produce the standard diagnostic gene transcript 
pattern or fingerprint for a particular disease or 
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condition or stage thereof, the above described 
oligonucleotide probes are used to probe mRNA or cDNA of 
the diseased sample to produce a signal for 
hybridization to each particular oligonucleotide probe 
species, ie. each unique probe. A standard control gene 
transcript pattern may also be prepared if desired using 
mRNA or cDNA from a normal sample. Thu S/ mRNA or cDNA 
is brought into contact with the oligonucleotide probe 
under appropriate conditions to allow hybridization. 

When multiple samples are probed, this may be 
performed consecutively using the same probes, e.g. on 
one or more solid supports, ie . on probe kit modules, or 
by simultaneously hybridizing to corresponding probes, 
e.g. the modules of a corresponding probe kit. 
15 To identify when hybridization occurs and obtain an 

indication of the number of transcripts/cDNA molecules 
which become bound to the oligonucleotide probes, it is 
necessary to identify a signal produced when the 
transcripts (or related molecules) hybridize (e.g. by 
20 detection of double stranded nucleic acid molecules or 

detection of the number of molecules which become bound, 
after removing unbound molecules, e.g. by washing) . 

In order to achieve a signal, either or both 
components which hybridize (ie. the probe and the 
25 transcript) carry or form a signalling means or a part 
thereof. This "signalling means" is any moiety capable 
of direct or indirect detection by the generation or 
presence of a signal. The signal may be any detectable 
physical characteristic such as conferred by radiation 
3 0 emission, scattering or absorption properties, magnetic 
properties, or other physical properties such as charge, 
size or binding properties of existing molecules (e.g. 
labels) or molecules which may be generated (e.g. gas 
emission etc.). Techniques are preferred which allow 
35 signal amplification, e.g. which produce multiple signal 
events from a single active binding site, e.g. by the 
catalytic action of enzymes to produce multiple 
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detectable products . 

Conveniently the signalling means may be a label 
which itself provides a detectable signal. Conveniently 
this may be achieved by the use of a radioactive or 
other label which may be incorporated during cDNA 
production, the preparation of complementary cDNA 
strands, during amplification of the target mRNA/cDNA or 
added directly to target nucleic acid molecules. 

Appropriate labels are those which directly or 
indirectly allow detection or measurement of the 
presence of the transcript s/cDNA. Such labels include 
for example radiolabels, chemical labels, for example 
chromophores or fluorophores (e.g. dyes such as 
fluorescein and rhodamine) , or reagents of high electron 
density such as ferritin, haemocyanin or colloidal gold. 
Alternatively, the label may be an enzyme, for example 
peroxidase or alkaline phosphatase, wherein the presence 
of the enzyme is visualized by its interaction with a 
suitable entity, for example a substrate. The label may 
also form part of a signalling pair wherein the other 
member of the pair is found on, or in close proximity 
to, the oligonucleotide probe to which the 
transcript /cDNA binds, for example, a fluorescent 
compound and a quench fluorescent substrate may be used. 
A label may also be provided on a different entity, such 
as an antibody, which recognizes a peptide moiety 
attached to the transcripts/cDNA, for example attached 
to a base used during synthesis or amplification. 

A signal may be achieved by the introduction of a 
label before, during or after the hybridization step. 
Alternatively, the presence of hybridizing transcripts 
may be identified by other physical properties, such as 
their absorbance, and in which case the signalling means 
is the complex itself. 

The amount of signal associated with each 
oligonucleotide probe is then assessed. The assessment 
may be quantitative or qualitative and may be based on 
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binding of a single transcript species (or related cDNA 
or other products) to each probe, or binding of multiple 
transcript species to multiple copies of each unique 
probe. It will be appreciated that quantitative results 
5 will provide further information for the transcript 

fingerprint of the disease which is compiled. This data 
may be expressed as absolute values (in the case of 
macroarrays) or may be determined relative to a 
particular standard or reference e.g. a normal control 
10 sample. 

Furthermore it will be appreciated that the 
standard diagnostic gene pattern transcript may be 
prepared using one or more disease samples (and normal 
samples if used) to perform the hybridization step to 
15 obtain patterns not biased towards a particular 
individual's variations in gene expression. 

The use of the probes to prepare standard patterns 
and the standard diagnostic gene transcript patterns 
thus produced for the purpose of identification or 
2 0 diagnosis or monitoring of a particular disease or 
condition or stage thereof in a particular organism 
forms a further aspect of the invention. 

Once a standard diagnostic fingerprint or pattern 
has been determined for a particular disease or 
2 5 condition using the selected oligonucleotide probes, 

this information can be used to identify the presence, 
absence or extent or stage of that disease or condition 
in a different test organism or individual. 

To examine the gene expression pattern of a test 
5 0 sample, a test sample of tissue, body fluid or body 

waste containing cells, corresponding to the sample used 
for the preparation of the standard pattern, is obtained 
from a patient or the organism to be studied. A test 
gene transcript pattern is then prepared as described 
5 hereinbefore as for the standard pattern. 

In a further aspect therefore, the present 
invention provides a method of preparing a test gene 
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transcript pattern comprising at least the steps of: 
a) isolating mRNA from the cells of a sample of 
said test organism, which may optionally be reverse 
transcribed to cDNA; 
5 b) hybridizing the mRNA or cDNA of step (a) to a 

set of oligonucleotides or a kit as described 
hereinbefore specific for a disease or condition or 
stage thereof in an organism and sample thereof 
corresponding to the organism and sample thereof under 
10 investigation; and 

c) assessing the amount of mRNA or cDNA hybridizing 
to each of said probes to produce said pattern 
reflecting the level of gene expression of genes to 
which said oligonucleotides bind, in said test sample. 

This test pattern may then be compared to one or 
more standard patterns to assess whether the sample 
contains cells having the disease, condition or stage 
thereof . 

Thus viewed from a further aspect the present 
20 invention provides a method of diagnosing or identifying 
or monitoring a disease or condition or stage thereof in 
an organism, comprising the steps of: 

a) isolating mRNA from the cells of a sample of 
said organism, which may optionally be reverse 

25 transcribed to cDNA; 

b) hybridizing the mRNA or cDNA of step (a) to a 
set of oligonucleotides or a kit as described 
hereinbefore specific for said disease or. 
condition or stage thereof in an organism and 
sample thereof corresponding to the organism 
and sample thereof under investigation; 

c) assessing the amount of mRNA or cDNA 
hybridizing to each of said probes to produce 
a characteristic pattern reflecting the level 

35 of gene expression of genes to which said 

oligonucleotides bind, in said sample; and 

d) comparing said pattern to a standard 
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diagnostic pattern prepared according to the 
method of the invention using a sample from an 
organism corresponding to the organism and 
sample under investigation to determine the 
5 presence of said disease or condition or a 

stage thereof in the organism under 
investigation. 
The method up to and including step c) is the 
preparation of a test pattern as described above. 
10 A s referred to herein, "diagnosis" refers to 

determination of the presence or existence of a disease 
or condition or stage thereof in an organism. 
"Monitoring" refers to establishing the extent of a 
disease or condition, particularly when an individual is 
15 known to be suffering from a disease or condition, for 
example to monitor the effects of treatment or the 
development of a disease or condition, e.g. to determine 
the suitability of a treatment or provide a prognosis . 
The presence of the disease or condition or stage 
2 0 thereof may be determined by determining the degree of 
correlation between the standard and test samples 1 
patterns. This necessarily takes into account the range 
of values which are obtained for normal and diseased 
samples. Although this can be established by obtaining 

2 5 standard deviations for several representative samples 

binding to the probes to develop the standard, it will 
be appreciated that single samples may be sufficient to 
generate the standard pattern to identify a disease if 
the test sample exhibits close enough correlation to 

3 0 that standard. Conveniently, the presence, absence, or 

extent of a disease or condition or stage thereof in a 
test sample can be predicted by inserting the data 
relating to the expression level of informative probes 
in test sample into the standard diagnostic probe 
35 pattern established according to equation 1. 

Data generated using the above mentioned methods 
may be analysed using various techniques from the most 
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basic visual representation (e.g. relating to intensity) 
to more complex data manipulation to identify underlying 
patterns which reflect the interrelationship of the 
level of expression of each gene to which the various 
probes bind, which may be quantified and expressed 
mathematically. Conveniently, the raw data thus 
generated may be manipulated by the data processing and 
statistical methods described hereinafter, particularly 
normalizing and standardizing the data and fitting the 
data to a classification model to determine whether said 
test data reflects the pattern of a particular disease, 
condition or stage thereof. 

The methods described herein may be used to 
identify, monitor or diagnose a disease, condition or 
ailment or its stage or progression, for which the 
oligonucleotide probes are informative. "Informative" 
probes as described herein, are those which reflect 
genes which have altered expression in the diseases or 
conditions in question, or particular stages thereof. 
Probes of the invention may not be sufficiently 
informative for diagnostic purposes when used alone, but 
are informative when used as one of several probes to 
provide a characteristic pattern, e.g. in a set as 
described hereinbefore. 

Preferably said probes correspond to genes which 
are systemically affected by said disease, condition or 
stage thereof. Especially preferably said genes, from 
which transcripts are derived which bind to probes of 
the invention, are metabolic or house-keeping genes and 
preferably are moderately or highly expressed. The 
advantage of using probes directed to moderately or 
highly expressed genes is that smaller clinical samples 
are required for generating the necessary gene 
expression data set, e.g. less than 1ml blood samples. 

Furthermore, it has been found that such genes 
which are already being actively transcribed tend to be 
more prone to being influenced, in a positive or 
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negative way, by new stimuli. In addition, since 
transcripts are already being produced at levels which 
are generally detectable, small changes in those levels 
are readily detectable as for example, a certain 
5 detectable threshold does not need to be reached. 

In preferred methods of the invention, the set of 
probes of the invention are informative for a variety of 
different diseases, conditions or stages thereof. A 
sub- set of the probes disclosed herein may be used for 
10 diagnosis, identification or monitoring a particular 
disease, condition or stage thereof. 

Thus the probes may be used to diagnose or identify 
or monitor any condition, ailment, disease or reaction 
that leads to the relative increase or decrease in the 
15 activity of informative genes of any or all eukaryotic 
or prokaryotic organisms regardless of whether these 
changes have been caused by the influence of bacteria, 
virus, prions, parasites, fungi, radiation, natural or 
artificial toxins, drugs or allergens, including mental 

2 0 conditions due to stress, neurosis, psychosis or 

deteriorations due to the ageing of the organism, and 
conditions or diseases of unknown cause, providing a 
sub- set of the probes as described herein are 
informative for said disease or condition or stage 
25 thereof. 

Such diseases include those which result in 
metabolic or physiological changes, such as fever- 
associated diseases such as influenza or malaria. Other 
diseases which may be detected include for example 

3 0 yellow fever, sexually transmitted diseases such as 

gonorrhea, fibromyalgia, candida-related complex, cancer 
(for example of the stomach, lung, breast, prostate 
gland, bowel, skin, colon, ovary etc), Alzheimer's 
disease, disease caused by retroviruses such as HIV, 
3 5 senile dementia, multiple sclerosis and Creutzfeldt- 
Jakob disease to mention a few. 

The invention may also be used to identify patients 
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with psychiatric or psychosomatic diseases such as 
schizophrenia and eating disorders. Of particular 
importance is the use of this method to detect diseases, 
conditions, or stages thereof, which are not readily 
5 detectable by known diagnostic methods, such as HIV 

which is generally not detectable using known techniques 
1 to 4 months following infection. Conditions which may 
be identified include for example drug abuse, such as 
the use of narcotics, alcohol, steroids or performance 
10 enhancing drugs. 

Preferably said disease to be identified or 
monitored is a cancer or a degenerative brain disorder 
(such as Alzheimer f s or Parkinson's disease). 

In particular, a set of oligonucleotide probes, 
15 wherein said set comprises at least 10 oligonucleotides 
selected from: 

an oligonucleotide as described in Table 4 or an 
oligonucleotide derived therefrom or an 
oligonucleotide with a complementary sequence, or a 
20 functionally equivalent oligonucleotide, 

may be used for diagnosis or identification or 
monitoring the progression of Alzheimer's disease. 
Similarly Table 2 probes and Table 2 derived probes and 
their functional equivalents may be used to diagnose, 
25 identify or monitor the progression of breast cancer. 

Especially preferably the probes used for breast cancer 
analysis are selected based on their occurrence as set 
forth in Table 3 and as described hereinbefore. 

The diagnostic method may be used alone as an 
alternative to other diagnostic techniques or in 
addition to such techniques. For example, methods of 
the invention may be used as an alternative or additive 
diagnostic measure to diagnosis using imaging techniques 
such as Magnetic Resonance Imagine (MRI) , ultrasound 
3 5 imaging, nuclear imaging or X-ray imaging, for example 
in the identification and/or diagnosis of tumours. 

The methods of the invention may be performed on 
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cells from prokaryotic or eukaryotic organisms which may 
be any eukaryotic organisms such as human beings, other 
mammals and animals, birds, insects, fish and plants, 
and any prokaryotic organism such as a bacteria. 

Preferred non-human animals on which the methods of 
the invention may be conducted include, but are not 
limited to mammals, particularly primates, domestic 
animals, livestock and laboratory animals. Thus 
preferred animals for diagnosis include mice, rats, 
guinea pigs, cats, dogs, pigs, cows, goats, sheep, 
horses. Particularly preferably the disease state or 
condition of humans is diagnosed, identified or 
monitored. 

As described above, the sample under study may be 

15 any convenient sample which may be obtained from an 

organism. Preferably however, as mentioned above, the 
sample is obtained from a site distant to the site of 
disease and the cells in such samples are not disease 
cells, have not been in contact with such cells and do 

20 not originate from the site of the disease or condition. 
In such cases, although preferably absent, the sample 
may contain cells which do not fulfil these criteria. 
However, since the probes of the invention are concerned 
with transcripts whose expression is altered in cells 

25 which do satisfy these criteria, the probes are 

specifically directed to detecting changes in transcript 
levels in those cells even if in the presence of other, 
background cells. 

It has been found that the cells from such samples 

3 0 show significant and informative variations in the gene 
expression of a large number of genes. Thus, the same 
probe (or several probes) may be found to be informative 
in determinations regarding two or more diseases, 
conditions or stages thereof by virtue of the particular 

35 level of transcripts binding to that probe or the 

interrelationship of the extent of binding to that probe 
relative to other probes. As a" consequence, it is 
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possible to use a relatively small number of probes for 
screening for multiple disorders or diseases. This has 
consequences with regard to the selection of probes, 
discussed in relation to random identification of probes 
hereinafter, but also for the use of a single set of 
probes for more than one diagnosis. Table 9 which 
represents preferred probes of the invention discloses 
probes which are informative for both Alzheimer's and 
breast cancer. 

Thus, the present invention also provides sets of 
probes for diagnosing, identifying or monitoring two or 
more diseases, conditions or stages thereof, wherein at 
least one of said probes is suitable for said 
diagnosing, identifying or monitoring at least two of 
15 said diseases, conditions or stages thereof, and kits 
and methods of using the same. Preferably at least 5 
probes, e.g. from 5 to 15 probes, are used in at least 
two diagnoses . 

Thus, in a further preferred aspect, the present 
20 invention provides a method of diagnosis or 

identification or monitoring as described hereinbefore 
for the diagnosis, identification or monitoring of two 
or more diseases, conditions or stages thereof in an 
organism, wherein said test pattern produced in step c) 
25 of the diagnostic method is compared in step d) to at 
least two standard diagnostic patterns prepared as 
described previously, wherein each standard diagnostic 
pattern is a pattern generated for a different disease 
or condition or stage thereof. 
30 Whilst in a preferred aspect the methods of 

assessment concern the development of a gene transcript 
pattern from a test sample and comparison of the same to 
a standard pattern, the elevation or depression of 
expression of certain markers may also be examined by 
35 examining the products of expression and the level of 

those products. Thus a standard pattern in relation to 
the expressed product may be generated: 
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In such methods the levels of expression of a set 
of polypeptides encoded by the gene to which an 
oligonucleotide of Table 1 or a Table 1 derived 
oligonucleotide, binds, are analysed. 

Various diagnostic methods may be used to assess 
the amount of polypeptides (or fragments thereof) which 
are present. The presence or concentration of 
polypeptides may be examined, for example by the use of 
a binding partner to said polypeptide (e.g. an 
antibody) , which may be immobilized, to separate said 
polypeptide from the sample and the amount of 
polypeptide may then be determined. 

"Fragments" of the polypeptides refers to a 
domain or region of said polypeptide, e.g. an antigenic 
fragment, which is recognizable as being derived from 
said polypeptide to allow binding of a specific binding 
partner. Preferably such a fragment comprises a 
significant portion of said polypeptide and corresponds 
to a product of normal post -synthesis processing. 

Thus in a further aspect the present invention 
provides a method of preparing a standard gene 
transcript pattern characteristic of a disease or 
condition or stage thereof in an organism comprising at 
least the steps of: 

a) releasing target polypeptides from a sample of 
one or more organisms having the disease or condition or 
stage thereof; 

b) contacting said target polypeptides with one or 
more binding partners, wherein each binding partner is 
specific to a marker polypeptide (or a fragment thereof) 
encoded by the gene to which an oligonucleotide of Table 
1 (or derived from a sequence described in Table 1) 
binds, to allow binding of said binding partners to said 
target polypeptides, wherein said marker polypeptides 
are specific for said disease or condition thereof in an 
organism and sample thereof corresponding to the 
organism and sample thereof under investigation; and 
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c) assessing the target polypeptide binding to said 
binding partners to produce a characteristic pattern 
reflecting the level of gene expression of genes which 
express said marker polypeptides, in the sample with the 
disease, condition or stage thereof. 

As used herein "target polypeptides" refer to those 
polypeptides present in a sample which are to be 
detected and "marker polypeptides" are polypeptides 
which are encoded by the genes to which Table l 
oligonucleotides or Table 1 derived oligonucleotides 
bind. The target and marker polypeptides are identical 
or at least have areas of high similarity, e.g. epitopic 
regions to allow recognition and binding of the binding 
partner . 

"Release" of the target polypeptides refers to 
appropriate treatment of a sample to provide the 
polypeptides in a form accessible for binding of the 
binding partners, e.g. by lysis of cells where these are 
present. The samples used in this case need not 
necessarily comprise cells as the target polypeptides 
may be released from cells into the surrounding tissue 
or fluid, and this tissue or fluid may be analysed, e.g. 
urine or blood. Preferably however the preferred 
samples as described herein are used. "Binding 
partners" comprise the separate entities which together 
make an affinity binding pair as described above, ' 
wherein one partner of the binding pair is the target or 
marker polypeptide and the other partner binds 
specifically to that polypeptide, e.g. an antibody. 

Various arrangements may be envisaged for detecting 
the amount of binding pairs which form. In its simplest 
form, a sandwich type assay e.g. an immunoassay such as 
an EL.ISA, may be used in which an antibody specific to 
the polypeptide and carrying a label (as described 
elsewhere herein) may be bound to the binding pair (e.g. 
the first antibody: polypeptide pair) and the amount of 
label detected. 
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Other methods as described herein may be similarly 
modified for analysis of the protein product of 
expression rather than the gene transcript and related 
nucleic acid molecules. 
5 Thus a further aspect of the invention provides a 

method of preparing a test gene transcript pattern 
comprising at least the steps of: 

a) releasing target polypeptides from a sample of 
said test organism; 

10 b) contacting said target polypeptides with one or 

more binding partners, wherein each binding partner is 
specific to a marker polypeptide (or a fragment thereof) 
encoded by the gene to which an oligonucleotide of Table 
1 (or derived from a sequence described in Table 1) 
15 binds, to allow binding of said binding partners to said 
target polypeptides, wherein said marker polypeptides 
are specific for said disease or condition thereof in an 
organism and sample thereof corresponding to the 
organism and sample thereof under investigation; and 

c) assessing the target polypeptide binding to said 
binding partners to produce a characteristic pattern 
reflecting the level of gene expression of genes which 
express said marker polypeptides, in said test sample. 
A yet further aspect of the invention provides a 
25 method of diagnosing or identifying or monitoring a 
disease or condition or stage thereof in an organism 
comprising the steps of: 

a) releasing target polypeptides from a sample of 
said organism; 

30 b ) contacting said target polypeptides with one or 

more binding partners, wherein each binding partner is 
specific to a marker polypeptide (or a fragment thereof) 
encoded by the gene to which an oligonucleotide of Table 
1 (or derived from a sequence described in Table 1) 

35 binds, to allow binding of said binding partners to said 
target polypeptides, wherein said marker polypeptides 
are specific for said disease or condition thereof in an 
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organism and sample thereof corresponding to the 
organism and sample thereof under investigation; and 

c) assessing the target polypeptide binding to said 
binding partners to produce a characteristic pattern 

5 reflecting the level of gene expression of genes which 
expx*ess said marker polypeptides in said sample; and 

d) comparing said pattern to a standard diagnostic 
pattern prepared as described hereinbefore using a 
sample from an organism corresponding to the organism 

10 and sample under investigation to determine the degree 
of correlation indicative of the presence of said 
disease or condition or a stage thereof in the organism 
under investigation. 

The methods of generating standard and test 
15 patterns and diagnostic techniques rely on the use of 

informative oligonucleotide probes to generate the gene 
expression data. In some cases it will be necessary to 
select these informative probes for a particular method, 
e.g. to diagnose a particular disease, from a selection 
20 of available probes, e.g. the probes described 

hereinbefore (the Table 1 oligonucleotides, the Table 1 
derived oligonucleotides, their complementary sequences 
and functionally equivalent oligonucleotides) . The 
following methodology describes a convenient method for 
25 identifying such informative probes, or more 

particularly how to select a suitable sub- set of probes 
from the probes described herein. 

Probes for the analysis of a particular disease or 
condition or stage thereof, may be identified in a 
3 0 number of ways known in the prior art, including by 

differential expression or by library subtraction (see 
for example W098/4 9342) . As described hereinafter, in 
view of the high information content of most 
transcripts, as a starting point one may also simply 
35 analyse a random sub- set of mRNA or cDNA species and 

pick the most informative probes from that sub-set. The 
following method describes the use of immobilized 
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oligonucleotide probes (e.g. the probes of the 
invention) to which mRNA (or related molecules) from 
different samples is bound to identify which probes are 
the most informative to identify a particular type of 
sample, e.g. a disease sample. 

The immobilized probes can be derived from various 
unrelated or related organisms; the only requirement is 
that the immobilized probes should bind specifically to 
their homologous counterparts in test organisms. Probes 
can also be derived from commercially available or 
public databases and immobilized on solid supports or, 
as mentioned above, they can be randomly picked and 
isolated from a cDNA library and immobilized on a solid 
support . 

15 Th e length of the probes immobilised on the solid 

support should be long enough to allow for specific 
binding to the target sequences. The immobilised probes 
can be in the form of DNA, RNA or their modified 
products or PNAs (peptide nucleic acids) . Preferably, 

20 the probes immobilised should bind specifically to their 
homologous counterparts representing highly and 
moderately expressed genes in test organisms . 
Conveniently the probes which are used are the probes 
described herein. 

25 The gene expression pattern of cells in biological 

samples can be, generated using prior art techniques such 
as microarray or macroarray as described below or using 
methods described herein. Several technologies have now 
been developed for monitoring the expression level of a 

3 0 large number of genes simultaneously in biological 

samples, such as, high-density oligoarrays (Lockhart et 
al., 1996, Nat. Biotech., 14, pl675-1680) , cDNA 
microarrays (Schena et al, 1995, Science, 270, p467-470) 
and cDNA macroarrays (Maier E et al . , 1994, Nucl . Acids 

35 Res., 22, p3423-3424; Bernard et al . , 1996, Nucl. Acids 
Res., 24, pl435-1442) . 

In high-density oligoarrays and cDNA raicroatrays, 
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hundreds and thousands of probe oligonucleotides or 
cDNAs , are spotted onto glass slides or nylon membranes, 
or synthesized on biochips . The mRNA isolated from the 
test and reference samples are labelled by reverse 
5 transcription with a red or green fluorescent dye, 

mixed, and hybridised to the microarray. After washing, 
the bound fluorescent dyes are detected by a laser, 
producing two images, one for each dye. The resulting 
ratio of the red and green spots on the two images 

10 provides the information about the changes in expression 
levels of genes in the test and reference samples. 
Alternatively, single channel or multiple channel 
microarray studies can also be performed. 

In cDNA macroarray, different cDNAs are spotted on 

15 a solid support such as nylon membranes in excess in 

relation to the amount of test mRNA that can hybridise 
to each spot. mRNA isolated from test samples is radio- 
labelled by reverse transcription and hybridised to the 
immobilised probe cDNA. After washing, the signals 

20 associated with labels hybridising specifically to 

immobilised probe cDNA are detected and quantified. The 
data obtained in macroarray contains information about 
the relative levels of transcripts present in the test 
samples . Whilst macroarrays are only suitable to 

25 monitor the expression of a limited number of genes, 
microarrays can be used to monitor the expression of 
several thousand genes simultaneously and is, therefore, 
a preferred choice for large-scale gene expression 
studies . 

3 0 A macroarray technique for generating the gene 

expression data set has been used to illustrate the 
probe identification method described herein. For this 
purpose, mRNA is isolated from samples of interest and 
used to prepare labelled target molecules, e.g. mRNA or 

35 cDNA as described above. The labelled target molecules 
are then hybridised to probes immobilised on the solid 
support . Various solid supports can be used for the 
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purpose, as described previously. Following 
hybridization, unbound target molecules are removed and 
signals from target molecules hybridizing to immobilised 
probes quantified. If radio labelling is performed, 
Phospholmager can be used to generate an image file that 
can be used to generate a raw data set. Depending on 
the nature of label chosen for labelling the target 
molecules, other instruments can also be used, for 
example, when fluorescence is used for labelling, a 
Fluorolmager can be used to generate an image file from 
the hybridised target molecules. 

The raw data corresponding to mean intensity, 
median intensity, or volume of the signals in each spot 
can be acquired from the image file using commercially 
15 available software for image analysis. However, the 
acquired data needs to be corrected for background 
signals and normalized prior to analysis, since, several 
factors can affect the quality and quantity of the 
hybridising signals. For example, variations in the 
quality and quantity of mRNA isolated from sample to 
sample, subtle variations in the efficiency of labelling 
target molecules during each reaction, and variations in 
the amount of unspecific binding between different 
macroarrays can all contribute to noise in the acquired 
25 data set that must be corrected for prior to analysis. 

Background correction can be performed in several 
ways . The lowest pixel intensity within a spot can be 
used for background subtraction or the mean or median of 
the line of pixels around the spots' outline can be used 
3 0 for the purpose. One can also define an area 

representing the background intensity based on the 
signals generated from negative controls and use the 
average intensity of this area for background 
subtraction. 

35 The background corrected data can then be 

transformed for stabilizing the variance in the data 
structure and normalized for the differences in probe 
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intensity. Several transformation techniques have been 
described in the literature and a brief overview can be 
found in Cui, Kerr and Churchill 

http : / / www . j ax . org/research/churchill/research/ 
5 expression/ Cui - Transform . pdf ) . Normalization can be 
performed by dividing the intensity of each spot with 
the collective intensity, average intensity or median 
intensity of all the spots in a macroarray or a group of 
spots in a macroarray in order to obtain the relative 
10 intensity of signals hybridising to immobilised probes 
in a macroarray. Several methods have been described 
for normalizing gene expression data (Richmond and 
Somerville, 2000, Current Opin. Plant Biol . , 3, pl08- 
116; Finkelstein et al . , 2001, In "Methods of Microarray 
15 Data Analysis. Papers from CAMDA, Eds. Lin & Johnsom, 
Kluwer Academic, p57-68; Yang et al.,' 2001, In "Optical 
Technologies and Informatics", Eds. Bittner, Chen, 
Dorsel & Dougherty, Proceedings of SPIE, 42 66, pl41-152; 
Dudoit et al , 2000, J. Am. Stat. Ass., 97, p77-87; Alter 
20 et al 2000, supra; Newton et al . , 2001, J. Comp. Biol., 
8, p37-52) . Generally, a scaling factor or function is 
first calculated to correct the intensity effect and 
then used for normalising the intensities. The use of 
external controls has also been suggested for improved 
25 normalization. 

One other major challenge encountered in 
large-scale gene expression analysis is that of 
standardization of data collected from experiments 
performed at different times. We have observed that 
3 0 gene expression data for samples acquired in the same 
experiment can be efficiently compared following 
background correction and normalization. However, the 
data from samples acquired in experiments performed at 
different times requires further standardization prior 
35 to analysis. This is because subtle differences in 

experimental parameters between different experiments, 
for example, differences in the quality and quantity of 
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mRNA extracted at different times, differences in time 
used for target molecule labelling, hybridization time 
or exposure time, can affect the measured values. Also, 
factors such as the nature of the sequence of 
5 transcripts under investigation (their GC content) and 
their amount in relation to the each other determines 
how they are affected by subtle variations in the 
experimental processes. They determine, for example, 
how efficiently first strand cDNAs, corresponding to a 

10 particular transcript, are transcribed and labelled 

during first strand synthesis, or how efficiently the 
corresponding labelled target molecules bind to their 
complementary sequences during hybridization. Batch to 
batch difference in the printing process is also a major 

15 factor for variation in the generated expression data. 

Failure to properly address and rectify for these 
influences leads to situations where the differences 
between the experimental series may overshadow the main 
information of interest contained in the gene expression 

20 data set, i.e. the differences within the combined data 
from the different experimental series. Figure 1 
provides one such example showing a classification based 
on Principal Component Analysis (PCA) of combined data 
from two experimental series where the main goal is to 

25 distinguish between Alzheimer/non-Alzheimer patients. 

PCA (also known as singular value decomposition) is 
a technique for studying interdependencies and 
underlying relationships of a set of variables . The 
data are modelled in terms of a few significant factors 

30 or principal components (PC's), plus residuals. The 
PC's contain the main phenomena and define the 
systematic variability present in the data, while the 
residuals represent the variability interpreted as 
noise. Details on PCA can be found in Jollife (1986, 

35 Principal Component Analysis, Springer-Verlag, NY) , and 
Jackson (1991, A User's Guide to Principal Components, 
Wiley, NY) . The results of Figure 1 show that two 
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clusters are formed representing the data from two 
experimental series rather than the 

Alzheimer/non-Alzheimer differentiation. There were 
eight samples in common between the two series of 
5 experiments, which ideally should have fallen on top of, 
or in near proximity to, each other if appropriately 
standardized . 

We have now found that gene expression data between 
different experiments can be efficiently standardized by 
10 including a subset of samples from one experimental 
series in the next experimental series and using a 
direct standardization method (DS) , originally described 
by Wang and Kowalski (Anal. Chem. , 1991, 63, p2750 and 
J. Chemometrics, 1991, 5, pl29-145) . Although the 
15 method of DS is well known in the field of analytical 
chemistry, it remains undescribed and unused in the 
field of gene expression data analysis. 

In DS, the secondary data representing for example 
experimental series 2 (secondary measurements, R 2 ) are 
corrected to match the data measured on the primary 
measurements representing data from series 1 (R x ) , while 
the calibration model remains unchanged. In DS, 
response matrices for both experimental series are 
related to each other by a transformation matrix F, i.e. 



= R 2 F (1) 

Where F is a square matrix dimensioned gene by 
gene. From (1), the transformation matrix is calculated 
as : 



F= R 2 + R x (2) 

The transformation matrix F in equation (2) is 
calculated using a relatively small subset of samples 
which are measured on both the master primary and the 
3 5 secondary series of data. 

Finally, the response of the unknown sample 
measured on the secondary series r T 2fUn , is standardized 
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to the response vector r T x , un expected from the primary 
series 

5 

r l,un = r T 2 i,un F (3) 



From the preceding equation it can be seen that the 
column i of the transformation matrix contains the 
10 multiplication factors for a set of genes measured in 
the secondary series to obtain the intensity at spot i 
of the corrected series. 

The number of samples that are repeated in the 
experimental series, R x and R 2 , should be equal to their 
15 ranks, which in this case is equal to the number of 
principal components retained for explaining the 
variation in the R x and R 2 . For example, if three 
principal components are retained for explaining the 
variation in the data set, a minimum of three samples 
20 should be repeated between R x and R 2 . The samples that 
should be repeated between different series should 
ideally be those that exhibit high leverages in the gene 
expression pattern. At times, two samples may suffice, 
while at other times, more than two samples should be 
25 ideally be included for good representativity . In some 
cases, the samples selected can be the same in all the 
experimental series to be compared (reference samples) , 
while in other cases, representative samples can be 
selected sequentially by analyzing the expression 
30 pattern after each experiment. The selected samples 
with high leverages are then included in the next 
experimental series. The results of using Direct 
Standardization are shown in Figure 1. 

Another approach for normalizing and standardizing 
3 5 the gene expression data set is to hybridize each DNA 

array with target molecules prepared from a test sample 
and an equal amount of labelled target molecules 
prepared from representative reference samples. In 
order to measure the intensity of labelled target 
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molecules hybridizing to the immobilized probes it is 
necessary that the labelled molecules are prepared from 
test and reference samples using different labels, for 
example, different fluorescent dyes can be used for 
5 preparing the labelled material. The labelled molecules 
prepared from reference samples can be added to the 
hybridization solution together with the labelled 
material prepared from test samples. A data file from 
each array representing the expression pattern of 

10 different genes in the test sample and reference samples 
can then be obtained, normalized and standardized by the 
direct standardization method as described above. An 
instant advantage of including the differentially 
labelled target molecules from reference samples during 

15 hybridization is that it enables an efficient comparison 
of new test samples to the data sets already stored in a 
database . 

Monitoring the expression of a large number of 
genes in several samples leads to the generation of a 

20 large amount of data that is too complex to be easily 
interpreted. Several unsupervised and supervised 
multivariate data analysis techniques have already been 
shown to be useful in extracting meaningful biological 
information from these large data sets. Cluster 

25 analysis is by far the most commonly used technique for 
gene expression analysis, and has been performed to 
identify genes that are regulated in a similar manner, 
and or identifying new/unknown tumour classes using gene 
expression profiles (Eisen et al . , 1998, PNAS, 95, 

30 pl4863-14868, Alizadeh et al . 2000, supra, Perou et al . 
2000, Nature, 406, p747-752; Ross et al , 2000, Nature 
Genetics, 24(3), p227-235; Herwig et al . , 1999, Genome 
Res., 9, p!093-1105; Tamayo et al , 1999, Science, PNAS, 
96, p2907-2912) . 

35 In the clustering method, genes are grouped into 

functional categories (clusters) based on their 
expression profile, satisfying two • criteria : homogeneity 
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- the genes in the same cluster are highly similar in 
expression to each other; and separation - genes in 
different clusters have low similarity in expression to 
each other. 

5 Examples of various clustering techniques that have 

been used for gene expression analysis include 
hierarchical clustering (Eisen et al . , 1998 , supra; 
Alizadeh et al . 2000, supra; Perou et al . 2 000, supra; 
Ross et al, 2000, supra), K-means clustering (Herwig et 
10 al., 1999, supra; Tavazoie et al , 1999, Nature Genetics, 
22(3), p. 281-285), gene shaving (Hastie et al . , 2000, 
Genome Biology, 1(2), research 0003.1-0003.21), block 
clustering (Tibshirani et al . , 1999, Tech repot Univ 
Stanford.) Plaid model (Lazzeroni, 2 0 02, Stat. Sinica, 
15 12, p61-86) , and self -organizing maps (Tamayo et al . 

1999, supra) . Also, related methods of multivariate 
statistical analysis, such as those using the singular 
value decomposition (Alter et al . , 2000, PNAS, 97(18), 
plOlOl-10106; Ross et al . 2000, supra) or 
20 multidimensional scaling can be effective at reducing 
the dimensions of the objects under study. 

However, methods such as cluster analysis and 
singular value decomposition are purely exploratory and 
only provide a broad overview of the internal structure 
25 present in the data. They are unsupervised approaches 

in which the available information concerning the nature 
of the class under investigation is not used in the 
analysis. Often, the nature of the biological 
perturbation to which a particular sample has been 
30 subjected is known. For example, it is sometimes known 
whether the sample whose gene expression pattern is 
being analysed derives from a diseased or healthy 
individual. In such instances, discriminant analysis 
can be used for classifying samples into various groups 
35 based on their gene expression data. 

In such an analysis one builds the classifier by 
training the data that is capable of discriminating 
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between member and non-members of a given class. The 
trained classifier can then be used to predict the class 
of unknown samples. Examples of discrimination methods 
that have been described in the literature include 
5 Support Vector Machines (Brown et al, 200 0, PNAS, 97, 
p262-267) , Nearest Neighbour (Dudoit et al . , 2000, 
supra), Classification trees (Dudoit et al . , 2000, 
supra), Voted classification (Dudoit et al . , 2000, 
supra), Weighted Gene voting (Golub et al . 1999, supra), 

10 and Bayesian classification (Keller et al . 2000, Tec 
report Univ of Washington) . Also a technique in which 
PLS (Partial Least Square) regression analysis is first 
used to reduce the dimensions in the gene expression 
data set followed by classification using logistic 

15 discriminant analysis and quadratic discriminant 
analysis (LD and QDA) has recently been described 
(Nguyen & Rocke, 2002, Bioinf ormatics , 18, p3 9-50 and 
1216-1226) . 

A challenge that gene expression data poses to 

2 0 classical discriminatory methods is that the number of 
genes whose expression are being analysed is very large 
compared to the number of samples being analysed. 
However in most cases only a small fraction of these 
genes are informative in discriminant analysis problems . 

25 Moreover, there is a danger that the noise from 

irrelevant genes can mask or distort the information 
from the informative genes. Several methods have been 
suggested in literature to identify and select genes 
that are informative in microarray studies, for example, 

30 t-statistics' (Dudoit et al, 2002, J. Am. Stat. Ass., 97, 
p77-87) , analysis of variance (Kerr et al . , 2000, PNAS, 
98, p8961-8965) , Neighbourhood analysis (Golub et al, 
1999, supra) , Ratio of between groups to within groups 
sum of squares (Dudoit et al . , 2002, supra), Non 

35 parametric scoring (Park et al . , 2002, Pacific Symposium 
on Biocomputing, p52-63) and Likelihood selection 
(Keller et al . , 2000, supra). 
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In the methods described herein the gene expression 
data that has been normalized and standardized is 
analysed by using Partial Least Squares Regression 
(PLSR) . Although PLSR is primarily a method used for 
5 regression analysis of continuous data (see Appendix A) , 
it can also be utilized as a method for model building 
and discriminant analysis using a dummy response matrix 
based on a binary coding. The class assignment is based 
on a simple dichotomous distinction such as breast 
10 cancer (class l) / healthy (class 2) , or a multiple 

distinction based on multiple disease diagnosis such as 
breast cancer (class 1) / Alzheimer (class 2) / healthy 
(class 3) . The list of diseases for classification can 
be increased depending upon the samples available 
15 corresponding to other diseases or conditions or stages 
thereof . 

PLSR applied as a classification method Sis referred 
to as PLS-DA (DA standing for Discriminant analysis) . 
PLS-DA is an extension of the PLSR algorithm in which 

20 the Y-matrix is a dummy matrix containing n rows 

(corresponding to the number of samples) and K columns 
(corresponding to the number of classes) . The Y-matrix 
is constructed by inserting 1 in the icth column and -1 
in all the other columns if the corresponding ith object 

25 of X belongs to class Jo By regressing Y onto X, 

classification of a new sample is achieved by selecting 
the group corresponding to the largest component of the 
fitted, y(x) = (y x (x) , y 2 (x) , . . . , y k (x) ) . Thus, in a 
-1/1 response matrix, a prediction value below 0 means 

30 that the sample belongs to the class designated as -1, 

while a prediction value above 0 implies that the sample 
belongs to the class designated as 1. 

An advantage of PLSR -DA is that the results 
obtained can be easily represented in the form of two 

35 different plots, the score and loading plots. Score 
plots represent a projection of the samples onto the 
principal components and shows the distribution of the 
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samples in the classification model and their 
relationship to one another. Loading plots display 
correlations between the variables present in the data 
set . 

5 It is usually recommended to use PLS-DA as a 

starting point for the classification problem due to its 
ability to handle collinear data, and the property of 
PLSR as a dimension reduction technique. Once this 
purpose has been satisfied, it is possible to use other 

10 methods such as Linear discriminant analysis, LDA, that 
has been shown to be effective in extracting further 
information, Indahl et al . (1999, Chem. and Intell . Lab. 
Syst., 49, pl9-31) . This approach is based on first 
decomposing the data using PLS-DA, and then using the 

15 scores vectors (instead of the original variables) as 
input to LDA. Further details on LDA can be found in 
Duda and Hart (Classification and Scene Analysis, 1973, 
Wiley, USA) . 

The next step following model building is of model 

20 validation. This step is considered to be amongst the 
most important aspects of multivariate analysis, and 
tests the "goodness" of the calibration model which has 
been built. In this work, a cross validation approach 
has been used for validation. In this approach, one or 

25 a few samples are kept out in each segment while the 
model is built using a full cross-validation on the 
basis of the remaining data. The samples left out are 
then used for prediction/classification. Repeating the 
simple cross-validation process several times holding 

30 different samples out for each cross-validation leads to 
a so-called double cross-validation procedure. This 
approach has been shown to work well with a limited 
amount of data, as is the case in some of the Examples 
described here. Also, since the cross validation step 

35 is repeated several times the dangers of model bias and 
overfitting are reduced. 

Once a calibration model has been built and 
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validated, genes exhibiting an expression pattern that 
is most relevant for describing the desired information 
in the model can be selected by techniques described in 
the prior art for variable selection, as mentioned 
elsewhere- Variable selection will help in reducing the 
final model complexity, provide a parsimonious model, 
and thus lead to a reliable model that can be used for 
prediction. Moreover, use of fewer genes for the 
purpose of providing diagnosis will reduce the cost of 
the diagnostic product. In this way informative probes 
which would bind to the genes of relevance may be 
identified. 

We have found that after a calibration model has 
been built, statistical techniques like Jackknife 
(Effron, 1982, The Jackknife, the Bootstrap and other 
resampling plans. Society for Industrial and Applied 
mathematics, Philadelphia, USA) , based on resampling 
methodology, can be efficiently used to select or 
confirm significant variables (informative probes) . 

The approximate uncertainty variance of the PLS 
regression coefficients B can be estimated by: 



S 2 B = 



25 



M 



m=l 



( (B-BJ g) 



where 

S 2 B = estimated uncertainty variance of B; 

B = the regression coefficient at the cross validated 
3 0 rank A using all the N objects; 

B m = the regression coefficient at the rank A using all 

objects except the object (s) left out in cross 

validation segment m; and 

g = scaling coefficient (here: g=l) . 
35 In our approach, Jackknife has been implemented 

together with cross-validation. For each variable the 

difference between the B-coef f icients Bi in a 
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cross -validated sub-model and B tot for the total model is 
first calculated. The sum of the squares of the 
differences is then calculated in all sub-models to 
obtain an expression of the variance of the B A estimate 
5 for a variable. The significance of the estimate of Bi 
is calculated using the t-test. Thus, the resulting 
regression coefficients can be presented with 
uncertainty limits that correspond to 2 Standard 
Deviations, and from that significant variables are 
10 detected. 

No further details as to the implementation or use 
of this step are provided here since this has been 
implemented in commercially available software, The 
Unscrambler, CAMO ASA, Norway. Also, details on 
15 variable selection using Jackknife can be found in 

Westad & Martens (2000, J. Near Inf. Spectr., 8, pll7- 
124) . 

The following approach can be used to select 
informative probes from a gene expression data set: 
20 a > keep out one unique sample (including its 

repetitions if present in the data set) per cross 
validation segment; 

b) build a calibration model (cross validated 
segment) on the remaining samples using PLSR-DA; 
25 c > select the significant genes for the model in 

step b) using the Jackknife criterion; 

d) repeat the above 3 steps until all the unique 
samples in the data set are kept out once (as described 
in step a) . For example, if 75 unique samples are 

3 0 present in the data set, 75 different calibration models 
are built resulting in a collection of 75 different sets 
of significant probes ; 

e) select the most significant variables using 
the frequency of occurrence criterion iiiv the generated 

35 sets of significant probes in step d) . For example, a 
set of probes appearing in all sets (100%) are more 
informative than. probes appearing in. only 50% of the 
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generated sets in step d) . 

Once the informative probes for a disease have been 
selected, a final model is made and validated. The two 
most commonly used ways of validating the model are 
5 cross-validation (CV) and test set validation. In 

cross-validation, the data is divided into k subsets. 
The model is then trained k times, each time leaving out 
one of the subsets from training, but using only the 
omitted subset to compute error criterion, RMSEP (Root 

10 Mean Square Error of Prediction) . If k equals the 
sample size, this is called "leave-one-out" cross- 
validation. The idea of leaving one or a few samples 
out per validation segment is valid only in cases where 
the covariance between the various experiments is zero. 

15 Thus, one sample at-a-time approach can not be justified 
in situations containing replicates since keeping only 
one of the replicates out will introduce a systematic 
bias in our analysis. The correct approach in this case 
will be to leave out all replicates of the same samples 

20 at a time since that would satisfy assumptions of zero 
covariance between the CV- segments . 

The second approach for model validation is to use 
a separate test -set for validating the calibration 
model . This requires running a separate set of 

25 experiments to be used as a test set.. This is the 
preferred approach given that real test data are 
available. 

The final model is then used to identify a disease, 
condition or stage thereof in test samples. For this 
30 purpose, expression data of selected informative genes 
is generated from test samples and then the final model 
is used to determine whether a sample belongs to a 
diseased or non-diseased class or has a condition or 
stage thereof. 

35 Thus viewed from a yet further aspect the present 

invention provides a method of identifying probes useful 
for diagnosing or identifying or monitoring a disease or 
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condition or stage thereof in an organism, comprising 
the steps of : 

a) immobilizing a set of oligonucleotide probes, 
preferably as described hereinbefore, on a 

5 solid support; 

b) isolating mRNA from a sample of a normal 
organism (normal sample) , which may optionally 
be reverse transcribed to cDNA; 

c) isolating mRNA from a sample from an organism, 
10 corresponding to the sample and organism of 

step (b) , which is known to have said disease 
or condition or a stage thereof (diseased 
sample) , which may optionally be reverse 
transcribed to cDNA; 
15 d) . hybridizing the mRNA or cDNA of steps (b) and 

(c) to said set of immobilized oligonucleotide 
probes of step (a) ; and 
e) assessing the amount of mRNA or cDNA 

hybridizing to each of said oligonucleotide 

2 0 probes to determine the level of gene 

expression of genes to which said 
oligonucleotide probes bind in said normal and 
diseased samples to generate a gene expression 
data set for each sample; 
25 f) normalizing and standardizing said data set of 

step (e) ; 

g) constructing a calibration model for 
classification, preferably using the 
statistical techniques Partial Least Squares 

30 Discriminant Analysis (PLS-DA) and Linear 

Discriminant Analysis (LDA) ; 

h) performing JackKnife analysis and identifying 
those oligonucleotide probes which are 
required for classification of said disease 

3 5 and normal samples into their respective 

groups . 

Preferably a model for . classification purposes is 
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generated by using the data relating to the probes 
identified according to the above described method. 
Preferably the sample is as described previously. 
Preferably the oligonucleotides which are immobilized in 
5 step (a) are randomly selected as described below or are 
the probes as described hereinbefore. Such 
oligonucleotides may be of considerable length, e.g. if 
using cDNA (which is encompassed within the scope of the 
term "oligonucleotide") . The identification of such 

10 cDNA molecules as useful probes allows the development 
of shorter oligonucleotides which reflect the 
specificity of the cDNA molecules but are easier to 
manufacture and manipulate. 

The above described model may then be used to 

15 generate and analyse data of test samples and thus may 

be used for the diagnostic methods of the invention. In 
such methods the data generated from the test sample 
provides the gene expression data set and this is 
normalized and standardized as described above. This is 

20 then fitted to the calibration model described above to 
provide classification . 

The method described herein can also be used to 
simultaneously select informative probes for several 
related and unrelated diseases or conditions. Depending 

25 upon which diseases or conditions have been included in 
the calibration or training set, informative probes can 
be selected for the said diseases or conditions. The 
informative probes selected for one disease or condition 
may or may not be similar to the informative probes 

30 selected for another disease or condition of interest. 
It is the pattern with which the selected genes are 
expressed in relation to each other during a disease, 
condition, or stage thereof, that determines whether or 
not they are informative for the disease, condition or 

35 stage thereof. 

In other words, informative genes are selected 
based on how their expression correlates with the 
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expression of other selected informative genes under the 
influence of responses generated by the disease, 
condition or stage thereof under investigation. in 
examples 1 and 2 provided hereinafter, 13 9 informative 
5 probes were selected for breast cancer diagnosis and 182 
probes were selected for Alzheimer's disease diagnosis 
by training the gene expression data set of genes 
representing 1435 or 758 randomly picked cDNA clones for 
breast cancer /non breast cancer samples, or 
10 Alzheimer/non-Alzheimer samples, respectively. Among 
the probes selected for breast cancer and Alzheimer, 
about 10 probes were informative both for breast cancer 
and Alzheimer disease diagnosis. 

For the purpose of isolating informative probes or 
15 identifying several related and unrelated diseases, 

conditions and stages thereof simultaneously, the gene 
expression data set must contain the information on how 
genes are expressed when the subject has a particular 
disease, condition or stage thereof under investigation. 
20 The data set is generated from a set of healthy or 

diseased samples, where a particular sample may contain 
the information of only one disease, condition or stages 
thereof or may also contain information about multiple 
diseases, conditions or stages thereof. For example, if 
25 the isolation of informative probes for Alzheimer 

disease, breast cancer and diabetes is sought, whole 
blood samples can be obtained from an Alzheimer patient 
who has breast cancer and diabetes. Hence, the method 
also teaches an efficient experimental design to reduce 
3 0 the number of samples required for isolating informative 
probes by selecting samples representing more than one 
disease, condition or stage thereof. 

As mentioned previously, in view of the high 
information content of most transcripts, the 
35 identification and selection of informative probes for 
use in diagnosing, monitoring or identifying a 
particular disease, condition or stage thereof may be 
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dramatically simplified. Thus the pool of genes from 
which a selection may be made to identify informative 
probes may be radically reduced. 

Unlike, in prior art technologies where informative 
5 probes are selected from a population of thousands of 
genes that are being expressed in a cell, like in 
microarray, in the method described herein, the 
informative probes are selected from a limited number of 
randomly obtained genes. For example, from a population 

10 of 1435 cDNA clones, randomly picked from a human whole 
blood cDNA library, we were able to select 139 
informative probes for breast cancer diagnosis (see 
Example 1 and Table 2) . 

Thus in a preferred aspect of the above mentioned 

15 method of identifying probes useful for diagnosing or 
identifying or monitoring a disease or condition or 
stage thereof in an organism, said set of 
oligonucleotides which are immobilized in step (a) are 
randomly selected from a larger set of oligonucleotides, 

20 e.g. from a cDNA library or other oligonucleotide pool, . 
which may be, but is preferably not selected from the 
set provided herein. Preferably said larger set 
comprises oligonucleotides which correspond to 
moderately or highly expressed genes . Thus preferably 

25 in methods of the invention, the set of oligonucleotides 
according to the invention are replaced with a set of 
oligonucleotides which are randomly selected, e.g. from 
commercially available oligonucleotide or cDNA 
libraries . 

30 As referred to herein "random" refers to selection 

which is not biased based on the extent of information 
carried by the transcripts in relation to the disease, 
condition or organism under study, ie . without bias 
towards their likely utility as informative probes. 

35 Whilst a random selection may be made from a pool of 
transcripts (or related products) which have been 
biased, e.g. to highly or. moderately expressed 
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transcripts, preferably random selection is made from a 
pool of transcripts not biased or selected by a 
sequence -based criterion. The larger set may therefore 
contain oligonucleotides corresponding to highly and 
5 moderately expressed genes, or alternatively, may be 
enriched for those corresponding to the highly and 
moderately expressed genes. 

Random selection from highly and moderately 
expressed genes can be achieved in a wide variety of 

10 ways. A strategy used in this work, but not limiting in 
itself involves randomly picking a significant number of 
cDNA clones from a cDNA library constructed from a 
biological specimen under investigation. Since, in a 
cDNA library, the cDNA clones corresponding to 

15 transcripts present in high or moderate amount are more 
frequently present than transcripts corresponding to 
cDMA present in low amount, the former will tend to be 
picked up more frequently than the latter. A pool of 
cDNA enriched for those corresponding to highly and 

2 0 moderately expressed genes can be isolated by this 
approach . 

To identify genes that are expressed in high or 
moderate amount among the isolated population for use in 
methods of the invention, the information about the 

25 relative level of their transcripts in samples of 
interest can be generated using several prior art 
techniques. Both non-sequence based methods, such as 
differential display or RNA fingerprinting, and 
sequence -based methods such as microarrays or 

30 macroarrays can be used for the purpose. Alternatively, 
specific primer sequences for highly and moderately 
expressed genes can be designed and methods such as 
quantitative RT-PCR can be used to determine the levels 
of highly and moderately expressed genes. Hence, a 

35 skilled practitioner may use a variety of techniques 

which are known in the art for determining the relative 
level of mRNA in a biological sample . 
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Especially preferably the sample for the isolation 
of mRNA in the above described method is as described 
previously and is preferably not from the site of 
disease and the cells in said sample are not disease. 
5 cells and have not contacted disease cells . 

The following examples are given by way of 
illustration only in which the Figures referred to are 
as follows : 

Figure 1 shows the effect of Direct Standardization 

10 (DS) on the Alzheimer data measured in two different 

series of experiments in which AD denotes Alzheimer's 
samples and A,B are non-Alzheimer's samples. The 
samples in both series have been labelled systematically 
as (xx_7/xx_8) , whereas the corrected samples from 

15 series 8 (in b,c,d) have been labelled as (xx_c) , thus, 
for example, AD2-7 denotes Alzheimer disease sample 
number 2 in experiment series 7. The circled spots 
represent the samples chosen as the transfer samples. 
The connecting lines in figures b,c,d show the proximity 

20 of the replicated samples after applying DS . The dashed 
lines in figures a,c,d represent the decision boundary 
separating the classes. These lines have not been drawn 
on the basis of any statistical criteria, but serve the 
purpose of visually separating the classes. All the 

25 four figures show scores plot (PC1-PC2) from PCA 

analysis based on (a) non-standardized data, (b) scores 
plot after direct standardization using 3 transfer 
samples, (c) scores plot after direct standardization 
using 4 transfer sample, (d) scores plot after direct 

30 standardization using 8 transfer samples; 

Figure 2 shows the projection of normal (including 
benign) and breast cancer samples onto a classification 
model generated by PLSR-DA using the data of 44 
informative genes, in which PC is the principal 

35 components and N and C are normal and breast cancer 
samples, respectively ; 

Figure 3 shows the projection of . individuals with 
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and without Alzheimer's disease onto a classification 
model generated by PLSR-DA using 182 informative genes; 

Figures 4 f 6 and 8 show projection plots as Figure 

2 in which the classification model is generated using 
5 719 , 111 and 345 cDNAs, respectively, wherein PC is the 
principal components, N denotes normal and B denotes 
breast cancer samples ; 

Figures 5, 7 and 9 show prediction plots based on 3 
principal components using the data of 719, 111 and 345 
10 cDNAs, respectively; 

Figure 10 shows a projection plot as Figure 3 in 
which the classification model is generated using 52 0 
cDNAs ; and 

Figure 11 is the prediction plot corresponding to 
15 Figure 10. 

Example 1: Diagnosis of Breast Cancer 

Methods 

20 

Whole blood was obtained from the arms of breast cancer 
patients and patients with benign tumours (Ulleval and 
Haukland hospitals in Norway) . All of the patients with 
breast cancer had a malignant tumour of the breast 

25 (disease samples) . Healthy blood was collected from the 

above two hospitals, or collected at a Health station at 
As, Norway or at DiaGenic AS, Norway, from the arms of 
female donors with no reported signs of breast cancer. 
The blood from healthy individuals or with benign 

3 0 tumours comprise the normal samples. The blood was 
either collected in tubes containing EDTA and stored 
immediately at -8 0°C or was collected in PAXgene tubes 
and stored for 12-24 hours at room temperature before 
finally storing them at -80°C before use. Further 

3 5 details of the breast cancer and benign tumour patients 
from which blood was taken is provided in Table 5. 
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mRNA was isolated from the blood of the 29 breast cancer 
patients and 4 6 normal donors and used to prepare 
labelled probes by reverse transcribing in the presence 
of a 33 P-dATP. The first strand cDNA of the normal and 
5 diseased samples was bound, separately to 1435 cDNA 

clones immobilized on a solid support (nylon membrane) . 
These cDNA clones were randomly picked, without any 
prior knowledge of their gene sequences, from a cDNA 
library constructed using whole blood of 550 healthy 
10 individuals (Clontech, Palo Alto, USA) . These methods 
were conducted as follows . 

For amplification of inserts, bacterial clones were 
grown in microtiter plates containing 150 /il LB with 50 

15 fig/ml carbenicillin, and incubated overnight with 

agitation at 37°C. To lyse the cells, 5 fxl of each 
culture were diluted with 50 /xl H20 and incubated for 12 
rain, at 95 °C. Of this mixture, 2 fil were subjected to a 
PCR reaction using 20 pmoles of M13 forward and reverse 

20 primer in presence of 1.5 mM MgCl 2 . PCR reactions were 
performed with the following cycling protocol: 4 min. at 
95 °C, followed by 25 cycles of 1 min. at 94°C, 1 min. at 
60 °C and 3 min. at 72 °C either in a RoboCycler® 
Temperature Cycler (Stratagene, La Jolla, USA) or DNA 

25 Engine Dyad Peltier Thermal Cycler (MJ Research Inc., 

Waltham, USA) . The amplified products were denatured by 
incubating with NaOH (0.2 M, final concentration) for 30 
min. and spotted onto Hybond-N+ membranes (Amersham 
Pharmacia Biotech, Little Chalfont, UK) , using MicroGrid 

3 0 II workstation according to the manufacturer's 

instructions (BioRobotics Ltd, Cambridge England) . The 
immobilized cDNAs were fixed using a UV cross- linker 
(Hoefer Scientific Instruments, San Francisco, USA) . 

35 In addition to the 1435 cDNAs, the printed arrays also 
contained controls for assessing background level, 
consistency and sensitivity of the assay. These were 
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spotted at multiple positions and included controls such 
as PCR mix (without any insert) ; positive and negative 
controls of SpotReportTM 10 array validation system 
(Stratagene, La Jolla, USA) and cDNAs corresponding to 
5 constitutively expressed genes such as b-actin, g-actin, 
GAPDH , HOD and cyclophilin. Also, oligonucleotides 
corresponding to SIX1, b~tubulin, TRP-2, MDM2, Myosin 
Light C, CD44, Maspin, Laminin, and SRP 19 were included 
to detect disseminated cancer cells. 

10 

The total RNA from blood collected in EDTA tubes was 
purified using Trizol LS Reagent protocol 

(Invitrogen/Life Technologies) . From blood contained in 
PAXgene tubes, the total RNA was purified according to 

15 the supplier's instructions (PreAnalytiX, Hombrechtikon, 
Switzerland) . Contaminating DNA was removed from the 
isolated RNA by DNAase I treatment using DNA- free kit 
(Ambion, Inc. Austin, USA) . RNA quality was determined 
visually by inspecting the integrity of 28S and 18S 

20 ribosomal bands following agarose gel electrophoresis. 
The concentration and purity of extracted RNA was 
determined by measuring the absorbance at 260 nm and 2 80 
nm. mRNA was isolated from the total RNA using Dynabeads 
as per the supplier's instructions (Dynal AS, Oslo, 

25 Norway) . 

Labelling and hybridization experiments were performed 
in batches. The number of samples assayed in each batch 
varied from six to nine. In the case of samples that 

3 0 were assayed more than once (replicates) , aliquots 
derived from the same mRNA pool were used for probe 
synthesis. For probe synthesis, aliquots of mRNA 
corresponding to 4-5 /ig of total RNA were mixed together 
with oligodT 25 Nv (0.5 /xg/ml) and mRNA spikes of 

35 SpotReport™ 10 array validation system (10 pg; Spike 2, 
1 pg) / heated to 70 °C to remove secondary structures, 
and then chilled on ice. Probes were prepared in 35/xl 
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reaction mixes by reverse transcription in the presence 
of 5 0/xCi [a 33 P] dATP, 3.5 fiM dATP, 0.6 mM each of dCTP, 
dTTP, dGTP, 200 units of Superscript reverse 
transcriptase (Invitrogen, Lif eTechnologies) and 0.1 M 
5 DTT, labelling for 1.5 hr at 42 °C. Following synthesis, 
the enzyme was deactivated for 10 min. at 70 °C and mRNA 
removed by incubating the reaction mix for 20 min. at 
37 °C in 4 units of Ribo H (Promega, Madison USA) . 
Unincorporated nucleotides were removed using ProbeQuant 
10 G 50 Columns (Amersham Biosciences , Piscataway, USA) . 

Prior to hybridization, the membranes were equilibrated 
in 4 x SSC for 2 hr at room temperature and 
prehybridized overnight at 65 °C in 10 ml 

15 prehybridisation solution (4 x SSC, 0.1 M NaH 2 P0 4 , 1 mM 
EDTA, 8% dextran sulphate, 10 x denhardt 1 s solution, 1% 
SDS) . Freshly prepared probes were added to 5 ml of the 
same prehybridisation solution, and hybridization 
continued overnight at 65°C. The membranes were washed 

20 at 65° C at increasing stringency (2 x 30 min. each in 2 
x SSC, 0.1% SDS; 1 x SSC, 0.1% SDS; 0.1 x SSC, 0.1% SDS) 
to remove unspecific signals. 

The amount of labelled first strand cDNA binding to each 
25 spot was assessed and quantified using a Phospholmager 
to generate a gene expression data set . The data was 
generated using Phoretix software version 3 (Non Linear 
Dynamics, England) . Background subtraction was 
performed on the generated data by subtracting the 
30 median of the line of pixels around each spot outline 
from the total intensity obtained from the respective 
spots . 

The background- subtracted data was then normalized and 
35 transformed by selecting out 50 lowest and 50 maximum 
signals from each membrane. This step was to exclude 
genes that were expressed with a high degree of 
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variance. Since the genes varied from membrane to 
membrane, the expression data from 497 genes were 
removed from the data set . The values for the remaining 
93 8 genes were then normalised by using different 
5 approaches such as external controls, dividing each spot 
by the median intensity of the observed signal in the 
respective membrane, range normalizing the data from 
each membrane, and then log transforming the data 
obtained. 

10 

The processed data obtained above was then used to 
isolate the informative probes by: 

a) keeping one unique sample (including all 
repetitions of the selected sample) out per cross 

15 validation segment ; 

b) building a calibration model (cross validated) 
on the remaining samples using PLSR-DA; 

c) selecting the set of significant genes for the 
model in step b using the Jackknife criterion ; 

20 d) repeating steps a) , b) and c) until all the 

unique samples were kept out once (hence, in all 75 
different calibration models were built (after repeating 
step b) 75 times) , resulting in 75 different sets of 
significant probes (after repeating step c) 75 times) ) ; 

25 e) selecting significant variables using the 

frequency of occurrence criterion amongst the 75 
different sets of significant probes. 

The selected informative probes based on occurrence 
30 criterion were used to construct a classification model. 
The result of the classification model based on probes 
appearing in at least 90% of the generated sets after 
the step of isolating informative probes as described 
above is shown in Figure 2 in which it is seen that the 
3 5 expression pattern of these genes was able to classify 
most women with breast cancer and women with no breast 
cancer into distinct groups. .In this figure PCI and PC2 
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indicate the two principal components statistically 
derived from the data which best define the systemic 
variability present in the data. This allows each 
sample, and the data from each of the informative probes 
5 to which the sample's labelled first strand cDNA was 

bound, to be represented on the classification model as 
a single point which is a projection of the sample onto 
the principal components - the score plot . 

10 The ability of the generated model, based on isolated 
informative probes, to predict future samples was 
determined by the double cross-validation approach. The 
performance of the diagnostic test for breast cancer 
based on the occurrence criterion is presented in Table 

15 6. 

Correct prediction of most breast cancer cells was 
achieved. These included all three samples obtained 
from women with ductal carcinoma in situ (DCIS) , 11/15 

20 samples obtained from women with stage I breast cancer r 
all five samples obtained from women with stage II 
breast cancer, and one of two samples obtained from 
women with stage III breast cancer. Interestingly, two 
correctly predicted stage I samples were obtained from 

25 women having a tumour size of <5 mm in diameter. 

The model also correctly predicted the class of most 
non-cancer samples (41/46) , including those that were 
obtained from women with non-cancerous breast 
3 0 abnormal it ies . 

Conf irmation that the gene transcripts are not from 
cells which are disseminated disease cells has been 
confirmed by several lines of evidences. Firstly, the 
3 5 informative genes were expressed constitutively at high 
or moderate levels in blood cells of women irrespective 
of whether they had cancer or not. . Secondly, in the 
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assay described in this Example, in order to identify 
transcripts, at least 720 disseminated cells in blood 
samples would be required. Since, the average number of 
disseminated cells present in blood during different 
5 stages of breast cancer is much lower (organ confined 

breast cancer, 0.8 cells per ml; invasive breast cancer 
spread to lymph nodes only, 2.4 cells per ml; and 
metastatic breast cancer, G cells per ml; SD>100%) (29) , 
we believe that the signals being detected originated 

10 from peripheral blood cells and could not have 

originated from disseminated cells. Thirdly, we were 
not able to detect any signal from the eight cancer 
markers known to have elevated expression in malignant 
cancer cells, including cancer cells that are 

15 disseminated in the blood. 



Example 2: Diagnosis of Alzhei mer's disease 

Similar experiments were conducted with samples from 
20 Alzheimer's patients. In this method 7 patients 

diagnosed with Alzheimer's Disease at the Memory Clinic 
at Ulleval University Hospital were used in the trial . 
The patients were confirmed as having Alzheimer's 
disease based on the following criteria: 
25 * A standardized interview with a care-giver using 
IQCODE, an ADL scale and a scale measuring 
behaviour of the patient (Green scale) . 

* Neuropsychological evaluation using MMSE, Clock 
drawing test, Trailmaking test A and B (TMT A and 

3 0 B) , Kendrick object learning test (visual memory 

test) , part of the Wechsler battery and Benton 
test . 

* A psychiatric evaluation using scales for detection 
of depression, MADRS for interviewing the patient 

3 5 and Cornell scale for interviewing the care-giver. 

* A physical examination. 

* . Laboratory tests of blood samples to . rule out other 
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diseases . 

* CT scan of the brain. 

* SPECT of the brain. 

5 The mean age of the patients was 72 . 3 with an age range 
of 69-76. The mean MMSE score was 22.0 (the maximum 
score attainable being 3 0) . 

Six age-matched individuals without diagnosed 
10 Alzheimer's disease were used as a control. All had 
been tested with MMSE and had a minimum score of 2 8 
(mean: 2 8.4) . The mean age of the normal control group 
was 73.0 and the age range 66-81. A sample from a 16- 
year old individual, with a consequent minimal chance of 
15 having Alzheimer's disease, was also included as an 
additional control . 

Using the methods described above (except that 
hybridization to 758 rather than 143 5 cDNA clones was 

20 performed) , informative probes were selected based on 
occurrence criterion and used to construct a 
classification model. The results of the classification 
model based on probes appearing at least once in the 
generated sets after the method to isolate informative 

25 probes as described above is shown in Figure 3 in which 
it will be seen that the expression pattern of these 
genes was able to classify individuals with or without 
Alzheimer's disease into distinct groups. In this 
Figure PCI and PC2 indicate the 2 principal components 

30 statistically derived from the data which define the 

systematic variability present in the data. This allows 
each sample, and the data from each of the informative 
probes to which the samples' cDNA was bound, to be 
represented on the classification model as a single 

35 point which is a projection of the sample onto the 
principal components - the score plot. 
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The ability of the generated model, based on isolated 
informative probes, to predict future samples was 
determined by the double cross-validation. The 
performance of the diagnostic test for Alzheimer's 
5 disease is presented in Table 7. 
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Appendix A 

Partial Least Squares regression (PLSR) 
5 Let a multivariate regression model be defined as: 

Y = XB + F 
where 

10 X a NxP matrix with N predictor variables (genes) ; 

Y (ItfxJ*) being the J predicted variables. In our case Y 
represents a matrix containing dummy variables; 

B is a matrix of regression coefficients; and 
F is a NxJ matrix of residuals. 

15 

The structure of the PLSR model can be written as : 

X = TP T + E A# and 

Y = TQ T + F A , where 

20 

where 

T {NxA) is a matrix of score vectors which are linear 
combinations of the x-variables; 

P (PxA) is a matrix with the x- loading vectors p a as 
25 columns; 

Q (JxA) is a matrix with the y-loading vectors q a as 
columns ; 

E a (NxP) is the matrix for X after A factors; and 
F a (NxiJ) is the matrix for Y after A factors. 

30 

The criterion in PLSR is to maximize the explained 
covariance of [X,Y] . This is achieved by the loading 
weights vector w a+1/ which is the first eigenvector of 
E a T F a F a T E a (E a and F a are the deflated X and Y after a 
3 5 factors or PLS components) . 

The regression coefficients are given by: 
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B = W(P T W)- X Q T 

A PLSR model with full rank, i.e. maximum number of 
components, is equivalent to the MLR solutions. Further 
details on PLSR can be found in Marteus & Naes # 198 9, 
Multivariate Calibration, John Wiley & Sons, Inc., USA 
and Kowalski & Seasholtz, 1991, supra. 
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Example 3: Validation of Example 1, diagnosis of breast: 
c ancer 

The results in Example 1 were validated by using the 

5 informative probes identified in Example 1 on new beast 
cancer and control samples . 

Methods 

The methods, essentially as described in Example 1, were 
10 used. Blood was taken from patients as described in 

Table 8. However, blood was collected in PAXgene tubes 
and the first strand labelled cDNAs were hybridized to 
719 cDNAs spotted on nylon membranes along with other 
controls as described in Example 1. After background 
15 subtraction using control spots, the data of each 

membrane was normalized using the inter quantile range. 
The data was analysed as described in Example 1 and the 
model validated by cross validation. 

2 0 The 719 cDNAs which were spotted are a subset of the 
cDNAs spotted in Example 1 and include 111 cDNAs 
described in Table 2 and which were found to be 
informative in Example 1 . 

25 Results 

The results are shown in Figures 4 to 9 . Figures 4, 6 
and 8 are projection plots similar to Figure 2 and show 
the projection of normal and breast cancer patients 1 
samples onto a classification model generated using all 

30 719 cDNA. Figure 6 is similar but uses a classification 
model generated with the 111 probes common to Example 1. 
Figure 8 uses the 345 sequences of the 719 for which 
sequence information is provided herein. In each case 
classification of normal and breast cancer groups was 

35 possible. Figures 5, 7 and 9 show prediction plots 
which reflect the ability of the generated models to 
correctly diagnose breast cancer. In the 3 prediction 
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plots shown, the disease samples appear on the x axis at 
4-1 and the non-disease samples appear at -i. The y axis 
represents the predicted class membership. During 
prediction, if the prediction is correct, disease 
5 samples should fall above zero and non-disease samples 

should fall below zero. In each case almost all samples 
are correctly predicted. 

Example 4: Validation of Example 2, diagnosis of 
10 Alzheimers 

The results in Example 2 were validated by using the 
informative probes identified in Example 2 on new 
Alzheimer's patient samples. 

15 

Methods 

The methods, essentially as described in Example 2, were 
used. Twelve female patients diagnosed with Alzheimer's 
disease at the Memory Clinic at Ulleval University 
20 Hospital who were confirmed as having Alzheimer's 

disease based on the criteria of Example 2 were used in 
• the trial. The mean age of the patients was 72.3 with 
an age range of 66-83. The mean MMSE score was 22.0 
(the maximum score attainable being 30) . 

25 

Sixteen age-matched female individuals without diagnosed 
Alzheimer's disease were used as the normal control 
group. All had been tested with MMSE and had a minimum 
score of 29. The mean age of the normal control group 
30 was 74.0 and the age range 66-86. 

After transfer of the blood to PAXgene tubes, total mRNA 
was isolated from the blood of the Alzheimer's disease 
and from the control group donors according to the 
35 manufacturers f s instructions (PreAnalytiX, 

Hombrechtikon, Switzerland) . The isolated mRNA was 
labelled during reverse transcription in the presence of 
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a 33 P-dATP, yielding a labelled first strand cDNA. 
Hybridization was performed as described previously onto 
73 0 cDNA clones picked from a cDNA library from whole 
blood of 55 0 healthy individuals without knowledge of 
5 the gene sequence of the random cDNA clones . 

The results are shown in Figures 10 and 11. Figure 10 
is a projection plot generated using 520 probes which 
10 have been sequenced. Figure 11 is a prediction plot and 
shows correct prediction of almost all samples. 
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Table 1a 

List of probes informative for disease diagnosis 





Clone ID 


ID 


Mo nf 

nucleotides 


1 


1-01 


- 


- 


2 


1-02 


- 


- 


3 


M3 


- 


- 


4 


1-21 


- 


- 


5 


I-24 


308 


373 


6 


h28 


310 


564 


7 


I-30 j 


1180 


622 


8 


1-34 


313 


554 


9 


1-37 


- 


- 


10 


1-42 


- 


** 


11 


1-52 


— 




■4 r\ 


1-54 


1181 


155 


lo 




32o 


554 


■i A 

14 


1 "7-4 

1-71 






lO 








ID 








17 








1fl 

IO 














fV?f\ 
0£0 




ll-VD 






i>1 


II 1 V/ 






22 


11-24 


^fl1 

OO 1 




23 


U-25 


382 


444 


24 


11-26 


383 




25 


11*33 


390 


523 


26 


11-34 


391 


566 


27 


11-41 


397 


534 


28 


M-42 


396 


512 


29 


11-47 






30 


11-57 


411 


505 


31 


11-61 


415 


596 


32 


11-69 


423 


387 


33 


11-70 


424 


420 


34 


11-75 


429 


535 


35 


11-83 






36 


11-84 


438 


577 


37 


H-87 


441 


552 


38 


11-88 


442 


606 


39 


11-90 






40 


11-94 


448 


32.9 . 


41 


111-02 


453 


747 


42 


111-05 






43 


111-06 


458 


682 
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44 


HI-08 


460 


536 


45 


HMO 


- 


- 


46 


IH-13 


464 


615 


47 


IIM5 


- 


- 


48 


HM7 


- 


- 


49 


III-20 


1183 


479 


50 


II1-23 


473 


694 


51 


III-26 


476 


476 


52 


IH-35 


485 


551 


53 


III-39 


467 


224 


54 


IH-40 


488 


349 


55 


UM3 


490 


382 


56 


IIM4 


491 


382 


57 


UI-53 


500 


390 


58 


llf-56 


503 


109 


59 


HI-57 


504 


374 


60 


III-60 


- 


- 


61 


W-60 


- 




62 


III-61 


507 


521 


63 


111-63 


509 


575 


64 


111-68 




- 


65 


111-74 


518 


502 


66 


IH-60 


523 


585 


67 


111-82 


- 


- 


68 


111*85 


526 


516 


69 


111-89 


530 


660 


70 


IH-92 


- 


- 


71 


111-96 


- 




72 


IV-14 


684 


! 545 


73 


IV-15 


1185 


628 


74 


IV-23 


- 


- 


76 


IV-26 


1186 


494 


75 


IV-26 


- 


- 


77 


IV-29 


*■ 


- 


78 


IV-31 


687 


268 


79 


IV-32 


688 


569 


80 


IV-34 


- 


- 


81 


IV-35 


- 


- 


82 


IV-41 


- 


- 


83 


IV-45 


- 


- 


84 


IV-53 


61 


362 


85 


IV-62 


- 


- 


86 


IV-69 


192 


286 


87 


IV-80 


701 


579 


88 


IV-82 


~ 


- 


89 


IV-93 


— 


- 


90 


IX-10 


736 


641 


91 


IX-12 








lA - 00 


/O/ 


coo 
boo 


93 


IX-39 


758 


424 


* 94 


IX-42 






95 


IX-48 


764 


626 


96 


IX-77 


785 


556 


97 


V-01 






98 


V-02 






! 99 


V-03 


706 


49$ 
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100 


V-04 


707 




101 


V-06 






102 


V-07 


708 




103 


V-11 


1188 


599 


! 104 


V-12 


711 


498 


105 


V-15 






106 


V-17 






107 


V-21 






108 


V-25 


_ 




109 


V-32 


_ 




110 


V-35 






111 


V-39 






112 


V-42 






113 


V-43 






114 


| V-47 






115 


V-49 






116 


V-52 






117 


V-54 






118 


V-55 


77 




119 


V-58 














1 C~ 1 


V-65 






1 C~£- 


V "uu 








V-71 






124 


V-75 






125 


V-79 






126 


V-80 


726 


260 


127 


V-90 






128 


V-91 






129 


V^92 






130 


V-94 






131 


VI-02 






132 


Vl-04 


865 


122 


133 


VI-07 


93 


405 


134 


VI-09 






135 


VMO 


_ 




136 


VM2 


869 


667 


137 | 


VI-14 


871 


642 


138 


VI-17 






139 


VI-20 


876 


115 


140 


yi-21 






141 


VI-23 


878 


634 


142 


VI-34 






143 


VI-41 




m 


144 


VI-42 


_ 


m 


145 


VI-43 






146 


VM4 






147 


VI-48 


891 


626 


148 


VI-49 


- 


- 


149 


VI-50 


893 


S8S 


150 


Vl-52 






151 


VI-53 


895 


560 


152 


VI-55 


897 


509 


153 


Vt-65 






154 


VI-70 


108 


550 


155 


VI-71 
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156 


VI-72 




- 


157 


VI-74 


905 


655 


158 


I VI-76 


907 


582 


159 


VI-76 


• 


- 


160 


Vl-79 


* 


- 


161 


VI-84 


- 


- 


162 


VI-87 


911 


595 


163 


VI-88 


912 


651 


164 


VI-90 


- 


- 


165 


Vl-93 


— 


- 


166 


VI-95 


915 


230 


167 


VI-96 




- 


168 


VII-02 




- 


169 


VII-03 


1196 


412 


170 


VH-06 




- 


171 


VIMO 




- 


172 


VIM 1 


- 


- 


173 


VIMS 


1199 


439 


174 


VIM 9 


562 


580 


175 


Vll-21 


564 


671 


176 


VII-25 


! 


- 


177 


VII-32 


571 


457 


178 


VII-36 


675 


209 


179 


Vll-39 


576 


541 


180 


VII-42 


579 


502 


181 


VII-43 


580 


316 


182 


VII-46 


583 


631 


183 


VII-47 


1200 


526 


184 


VII-48 


1201 


613 


185 


VII-59 


593 


565 


186 


Vll-60 


• 


- 


187 


VII-63 


595 


98 


188 


VH-66 


598 


362 


189 


VII-67 


- 


- 


190 


Vlf-72 


600 


595 


191 


Vll-73 


601 


522 


192 


VII-75 






193 


Vll-76 


603 


624 


194 


VIK77 


1203 


692 


195 


VI 1-80 


605 


338 


196 


VII-81 


606 


556 


197 


VII-83 


~~ 


— 


198 


VII-86 


™ 




199 


VH-88 






200 


VII-90 


612 


576 


201 


L SI 1 J~l<4 

vn-91 


613 


341 


202 


VI 1-93 


615 


379 


\ 203 


VIIKJl 














205 


Vllt-03 






206 


VIII-06 






207 


VIII-09 


618 


598 


208 


vm-io 






209 


VIIM5 






210 


Vlll-20 


628 


419 


211 


VIII-22 
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212 


VIII-26 








VHI-28 


634 


«n i 


214 


Vlll-29 ! 


635 


592 


215 


VHI-30 


636 


572 


216 


VIII-31 


637 


482 


217 


VIII-32 


638 


545 


218 


VHI-33 


639 


624 


219 


Vlll-39 






220 


VHI-41 


645 


649 


221 


VIII-42 


646 


600 


222 


Vlll-44 






223 


Vllf-46 


649 


! 425 


224 


Vllt-46 


651 


! 251 


225 


VIII-58 








VIII-64 


663 


627 


227 


V 111^*** 








VIII-K6 

V IIHwW 


665 


345 




Vlll-67 






2^n 


VIII * *T 






£0 I 


VIII i »J 


675 


591 




V 111 9 *J 








VIII-A2 
viii \yt- 








VIII-83 








VHUA5 

V III OJ 
















VII1-Q1 

VIII 57 1 








X/III-Q2 








V III wO 








Vllf-95 






241 


X-04 






242 


X-07 


808 


641 


243 


X-15 


614 


132 


244 


X-29 


821 


370 


245 


X-34 






246 


X-35 






247 


X-54 


837 


603 


248 


X-56 


839 


71 


249 


X-68 


1207 


642 


250 


X-72 


849 


622 


251 


X-94 


860 


501 


252 


XI-07 






253 


Xl-13 


1209 


620 


254 


XI-50 






255 


XI-58 


_ 


m 


256 


XI-81 


1212 


374 


257 


XII-07 


1213 


567 


258 


XIM7 






259 


Xll-26 




• 


260 


XII-27 




- 


261 


XII-31 






262 


Xll-32 






263 


XII-35 


1214 


620 


264 


XII-36 






265 


Xll-52 






266 


Xll-59 


1216 


484 


267 


XIIM9 


1219 


559 
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268 


XHI-29 


- 


1 


269 


XHI-52 


939 


513 


i 270 


Xlfl-62 


- 




271 


XIII-84 


- 


- 


272 


XI1I-92 


1221 


741 


273 


XV-18 


- 


- 


274 


XV-22 


1099 


561 


275 


XV-24 


- 


- 


276 


XV-25 


1224 


485 


277 


XV-28 


- 


- 


278 


XV-34 


- 


- 


279 


XV-42 




- 


280 


XV-68 


- 


- 


281 


XV-74 


- 


- 


282 


XV-93 


- 


- 


283 


XV-94 


- 


- 


284 


XV-96 


- 


- 


285 


XVI-36 


1056 


435 


286 


XVI-53 


1230 


741 


287 


XVf-59 


- 


- 


288 


XVI-66 


1074 


689 


289 


XVI-76 


1083 


198 


! 290 


XVI-77 


1084 


198 


291 


XVII-07 


- 




292 


XVII-08 




- 


293 


XVIM7 


- 


- 


294 


XVH-28 


- 




295 


XVI 1-29 


- 


- 


296 


XVH-31 


1139 


503 


297 


XVII-36 


- 


- 


298 


XVH-39 


- 


- 


299 


XVII-40 


1231 


203 


300 


XVII-48 


1148 


587 


301 


XVII-55 






302 


XVH-58 






303 


XVll-67 






304 


XVII-72 






305 


XVII-76 


1160 


650 


306 


XVII-82 






307 


XVll-67 


1165 


502 


308 


XVII-95 


1172 


648 



WO 2004/046382 



PCT/GB2003/005102 



Table 1 b 

List of sequences of probes informative for disease diagnosis 

Please see the note at the bottom 



Clone ID 


Sequence ID 


1-09 


298 


MO 


299 


1-13 


1331 


1-14 


1178 


1-15 


300 


1-16 


301 


1-17 


302 


1-19 


304 


1-20 


305 


1-22 


306 


1-23 


307 


1-24 


308 


1-25 


309 


1-28 


310 


1-30 


1180 


1-31 


311 


I-32 


312 


I-34 


313 


I-37 


1440 


I-38 


314 


I-39 


315 


I-40 


316 


I-42 


1332 


I-44 


317 


I-45 


318 


I-46 


319 


I-47 


320 


!-48 


321 


I-49 


322 


I-53 


323 


I-54 


1181 


I-56 


324 


I-57 


325 


I-58 


326 


I-60 


327 


I-64 


328 


I-67 


330 


I-69 


331 


1-71 


332 


I-72 


333 


I-73 


334 


I-77 


335 


I-79 


336 


I-80 


337 
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1-81 


338 


I-82 


339 


I-86 


1336 


I-88 


1182 


I-95 


1337 


IM)2 


360 


tl-03 


361 


II-05 


363 


II-06 


364 


II-07 


365 


il-08 


366 


II-09 


367 


11-10 


368 


IM1 


369 


IM2 


370 


11-13 


371 


11-14 


372 


11-15 


373 


11-16 


374 


11-17 


375 


11-18 


376 


II-20 


377 


11-21 


378 


II-22 


379 


U-23 


380 


II-24 


381 


II-25 


382 


II-26 


383 


II-27 


384 


II-28 


385 


II-29 


386 


I (-30 


387 


11-31 


388 


H-32 


389 


II-33 


390 


II-34 


391 


II-35 


392 


H-37 


393 


U-38 


394 


II-39 


395 


II-40 


396 


11-41 


397 


II-42 


398 


D-43 


399 


II-44 


400 


H-46 


401 


II-47 


402 


II-48 


403 


II-49 


404 


H-50 


405 


II-52 


406 
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U-53 


407 


11-54 


408 


11-55 


409 


11-56 


410 


11-57 


411 


11-58 


412 


11-59 


413 


tl-60 


414 


11-61 


415 


II-62 


416 


tl-63 


417 


II-64 


418 


II-65 


419 


II-66 


420 


II-67 


421 


N-68 


422 


H-69 


423 


II-70 


424 


11-71 


425 


II-72 


426 


II-73 


427 


II-74 


428 


II-75 


429 


li-76 


430 


II-77 


431 


II-78 


432 


II-79 


433 


II-80 


434 


11-81 


435 


II-82 


436 


II-83 


437 


II-84 


438 


II-85 


439 


II-86 


440 


H-87 


441 


II-88 


442 


ii-89 


443 


II-90 


444 


11-91 


445 


II-92 


446 


II-93 


447 


il-94 


448 


II-95 


449 


H-96 


450 


111-01 


452 


HI-02 


453 


IN-03 


454 


III-04 


455 


ill-05 


457 


ill-06 


458 


III-07 


459 
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III flQ 




ill no 


AM 


III -i 4 


Aft9 


III -f o 

III- \/L 


*tDO 


in «n 
111-13 


AAA 
*l-04 


111 A A 

111-14 


Aft*\ 


111-15 


/I RR 


lit -4 

111-16 


**D/ 


III H "7 

111-17 


/ICO 


lit <io 

111-18 


AftQ 
**Oy 


III Af\ 

111-19 


** / u 


III o/"i 

1 1 1-20 


1 1 Do 


111 04 

111-21 


*r# 1 


i it oo 
111-22 


A70 


111-23 


A 71 


III o >f 

111-24 


A "7 A 


111-25 


4/D 


111-26 


4/0 


111-27 


4/ / 


111-28 


vl 7 A 
4 / O 


111-29 


>l"70 

4 /y 


III O 4 

111-31 


4ol 


IH-32 


>i no 


111 oo 




III O y( 


A R/1 

4o4 


in oc 


4oO 


III-37 


4ob 


in on 

Ul-39 


Vl <*7 


in a r\ 

NI-40 


A&& 
400 


III A O 


4(5y 


III jI o 


4yu 


III ><4 >4 

1 1 I-44 


4y i 


til AC 


4yz 


hi Ad 




111 AT 

III-4/ 


*fr»7*f 


III A P 




III ilO 


A Oft 


III c c\ 

I1I-50 


AQ7 


ill C A 

111-51 


A DP 


III CI 

III-52 




111 CO 




III C A 

III-54 


DU 1 


III cc 

Ml-55 




1 II cc 




i ij— o / 


504 


111-58 


505 


111-59 


506 


111-61 


507 


III-62 


508 


III-63 


509 


III-64 


510 
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111-65 


511 


111-66 


512 


111-67 


513 


111-69 


514 


111-70 


515 


111-71 


516 


Ul-73 


517 


IH-74 


518 


Ul-75 


519 


III-77 


520 


III-78 


521 


III-79 


522 


UI-8G 


523 


111-81 


524 


III-82 


1348 


III-83 


525 


III-85 


526 


III-86 


527 


III-87 


528 


lli-88 


529 


lli-89 


530 


111-91 


531 


III-92 


1351 


IH-93 


532 


Ul-94 


533 


lli-95 


534 


III-96 


535 


IV-02 


681 


IV-04 


682 


IV-13 


683 


IV-14 


684 


1V-15 


1185 


IV-17 


685 


IV-23 


1353 


IV-26 


1186 


IV-28 


686 


IV-31 


687 


IV-32 


688 


IV-35 


1355 


IV-37 




IV-38 


689 


IV-40 


690 


IV-42 


691 


IV-43 


1239 


IV-44 


692 


lV-47 


693 


IV-53 


61 


IV-55 


694 


IV-56 


695 


IV-61 


696 


IV-64 


f 697 
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IV -bo 


oyo 


t\ / Rfl 

iv-oy 


1 QO 


IV-/2 


oyy 


IV-/ 3 


/UU 


IV-oU 


/U1 


1 V-o2 


1 Oft 

i yo 


i\ / on 


/02 




7AO 

/ Uo 


iv-yo 


f U4 


iv-yo 


7nc 
/UO 


IX- 10 


70C 
/oO 


IX- 12 


/ oo 


IX-13 


/ oy 


IX-24 


7>I7 

/4/ 


IX-38 


7t7 

/o/ 


IX-39 


7CQ 
/ OO 


INS y* O 

IX-48 


7o4 


IX-50 


7CC 
/DO 


IX-56 


768 


IX-62 


770 

773 


IX-65 


77c 
/ /O 


• V/* TO 

IX-72 


7QO 

/o2 


IX-77 


785 


IX-91 


/yo 


IX-96 


oUI 


V-01 


1 361 


V-03 


/ UO 


V-04 


7n7 
/U/ 


v-07 


7HQ 

/Uo 


V-Oo 


7AO 

/uy 


V-09 


/ 1U 


V-1 1 


i 1 ftp 
1 1 oo 


VI-lO 


o /o 


vi -i y 


O / O 


V-1 2 


7*1 i 
/ 1 1 


V-1 / 




V-lO 


71 O 


\/ on 
V-2U 


71 
/ I O 


v-24 


71 A 


v-25 


1 ooo 


V-2o 


\ itsy 


V-oo 


1 occ 
I ODD 


V-3/ 


no 


V-oo 


I ion 

I I yu 


V "03 


1 109 


V-40 


717 


V-41 


718 


V-47 


1368 


V-48 


719 


V-49 


1369 


V-55 


77 
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V-57 


720 


V-58 


1370 


V-61 


721 


V-64 


722 


V~65 


723 


V-68 


1448 


V-71 


1495 


V-74 


724 


V-75 


1372 


V-80 


726 


V-81 


727 


V-87 


728 


V-90 


1374 


VI-02 


340 


VI-03 


341 


VI-04 


342 


VI-06 


343 


VI-07 


344 


VI-08 


345 


VI-09 


346 


VI-11 


347 


VI-12 


869 


VI-13 


870 


VI-14 


871 


VI-16 


873 


VI-18 


348 


Vl-19 


349 


VI-20 


350 


VI-21 


351 


Vi-22 


352 


VI-23 


878 


VI-24 


879 


Vl-25 


353 


VI-26 


354 


VI-27 


355 


VI-31 


356 


VI-32 


885 


VI-33 


357 


VI-35 


358 


VI-39 


887 


VI-43 


1382 


VI-44 


1193 


VMS 


889 


VI-48 


359 


VI-49 


892 


VI-50 


893 


VI-53 


895 


VI-55 


897 


VI-58 


899 


VI-66 


903 


VI-67 


904 
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VI-70 


108 


VI-71 


1387 


VI-74 


905 


VI-75 


906 


VI-76 


907 


VI-77 


110 


VI-79 


1389 


VI-80 


908 


VI-85 


910 


VI-87 


911 


VI-88 


912 


VI-90 


1390 


VI-93 


1391 


VI-95 


915 


Vl-96 


1392 


Vli-02 


547 


VII-03 


548 


VII-04 


549 


Vll-05 


550 


VII-06 


551 


VII-07 


552 


VII-08 


553 


VII-09 


554 


VI MO 


555 


VIM1 


556 


VIM 2 


557 


VIM 4 


558 


VIMS 


559 


VIM 7 


560 


VIM 8 


561 


VIM 9 


562 


VII-20 


563 


VI 1-21 


564 


VII-22 


565 


VII-23 


566 


VII-24 


567 


VII-25 


1397 


VII-26 


250 


VII-27 


568 


VII-28 


569 


VII-29 


570 


VII-32 


571 


VII-33 


572 


VII-34 


573 


VII-35 




VII-36 


575 


VII-39 


576 


VIMO 


577 


VII-41 


578 


VII-42 


579 


VII-43 


580 
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VI 1-44 


581 




582 


VI U4R 


583 


VI 1-47 

V 1 1 *T / 


1200 


VI 1-48 

vii *tu 


584 


VI 1-49 


585 


VI 1-50 


586 


VII 


587 




588 


VI 1-54 

V I 1 


589 


VI 1-55 


590 




591 


V 1 1 J yJ 


592 


VII-5Q 


593 


\/ll-fiS> 


594 


VILfi'* 

V 1 1 v> O 


595 


X/11-R4 


596 


V/II-R5 
v ll**0*J 


597 


V ll-OO 


598 


v ii-u / 


1399 


VII"*/ i 


599 


VII- / c- 


600 


V II- 1 o 


601 


VII-74 

V II- / *T 


602 


VI 1-76 

VII / u 


603 


VI 1-77 

VII/ / 


604 


V ll-OU 


605 


v no i 


606 


V H Ot 


607 


V ii-oo 


608 


Vll-84 

V 11-tJH" 


609 


VIU86 

V 1 1 uu 


1453 


VI 1-87 


610 


VII-89 


611 


VI 1-90 


612 


VI 1-91 


613 


Vll-92 

V It CJx- 


614 


VII-93 

V 1 1 O w 


615 


\/l l-QA 
V ll-o** 


616 


VILQ6 

V II "3Q 


617 


Vlll-09 

V 1 1 1 v o 


618 


\/iiui n 

V III 1 u 


619 


VIII-1 1 

V III 1 1 


620 


VIII-1 2 


621 


Vlll-13 


622 


VIII-1 5 


623 


VIIM6 


624 


VIII-1 7 


625 


Vlll-18 


626 


VIII-1 9 


627 


Vlll-20 


628 
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VM-21 


629 


Vlll-22 


1455 


VIII-23 


630 


VIH-24 


631 


VIIU25 


632 


VIII-26 


1456 


Vlli-27 


633 


VIII-28 


634 


VIII-29 


635 


VIII-30 


636 


VI 11-31 


637 


VIII-32 


638 


VIII-33 


639 


VIII-34 


640 


VUI-36 


641 


VIII-37 


642 


VllI-38 


643 


VIII-40 


644 


Vlil-41 


645 


VIII-42 


646 


Vlll-43 


647 


VIII-45 


648 


Vlil-46 


649 


VIII-47 


650 


VIII-48 


651 


Vlll-50 


652 


VM-51 


653 


VIII-53 


654 


VIII-54 


655 


Vlli-55 


656 


Vlll-56 


657 


VIII-57 


658 


Vlll-58 


659 


VIII-59 


660 


VIII-60 


661 


VJII-61 


662 


VIII-64 


663 


Vill-65 


664 


VIII-66 


665 


VIII-67 


666 


VIII-68 


667 


VIII-69 


668 


VIII-70 


669 


Vlll-71 


670 


VM-72 


671 


VIII-73 


672 


VIII-74 


673 


VIII-75 


674 


VIII-76 


675 


VIII-77 


676 


Vlll-78 


677 
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VIll-79 


678 


VIH-80 


679 


X-07 


808 


X-15 


814 


X-20 


817 


X-29 


821 


X-34 


825 


X-46 


833 


X-54 


837 


X-56 


839 


X-68 

XV uu 


1207 

1 Cm\J I 


X-72 

y\ 1 Cm 


849 


X-73 


1208 


X-94 


860 


XI-13 

/V 1 1 vJ 


120Q 


XI-37 


1460 


XI-43 


1210 


XI-67 


1211 


XI-81 

✓VI *-> 1 


1212 

1 Cm \ C 


XII-07 

/VII w / 


121 ^ 


XII-35 


1214 


XII-36 


1215 




1216 

1 Cm | Vl 


XM-65 


1028 


XII-92 

XVI 1 w?/C 


1217 

1 Cm 1 / 


XIII-03 


917 


XIII-04 


1218 


XIII- 1 9 


1219 


XIII-24 


926 


Xlil-51 


938 


XIH-52 


939 


XIII-67 


947 


XUI-69 


949 


XIII-88 


1220 


XIH-92 


1221 


XV-22 


1099 


XV-24 


1101 


XV-25 


1224 


XV-42 


1108 


XV-62 


1226 


XV-64 


1118 


XV-84 


1125 


XVI-19 


1228 


XVI-36 


1056 


XVI-53 


1230 


XVI-60 


1071 


XVI-66 


1074 


XVI-74 


1081 


XVI-76 


1083 


XVI-77 


1084 


XVII-31 


1139 
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XVI 1-40 


1231 


XVI 1-48 


1148 


XVI 1-76 


1160 


XVI 1-87 


1165 


XVI 1-95 


1172 



Note 



Sequences not available for sequence IDs in Table 1, and corresponding sequence Ids in 
Table 2 and 4. 

298,301,305,307,312,317,318,319,320,332,333,334,336,340,341,342,343,344,345,346,347,34 

8,349,350,351,352,353,354,355,356,357,358,359,367,372,375,376,377,379,385,392,393,404, 

437,439,440,443,444,445,449,455,457,465,466,467,468,470,486,498,501,5 1 1,5 14,5 16,5 17,52 

0,522,528,531,535,547,548,549,550,551,552,553,554,555,556,557,558,559,573,584,604,608, 

616,620,623,640,659,662,664,667,668,673,677,678,679,681,695,702,712,716,825,886,894,90 

2,909,916,1101,1108,1109,1177,1187,1193,1204,1220,1239,1255,1256,1342,1347,1354, 

1357,1362,1363,1364,1373,1375,1379,1403,1404,1405,1406,1413 
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Table 2a 



List of informative probes for diagnosis of breast cancer 



Clone ID 


sequence id 


1-24 


308 


1-28 


310 


1-30 


1180 


1-52 


- 


1-54 


1181 


11-41 


397 


H-70 


424 


II-87 


441 


JH-06 


458 


llt-20 


1183 


IIt-40 


f 488 


llt-57 


504 


HI-60 


- 


111-61 


507 


III-89 


530 


IV-14 


684 


IV-15 


1185 


IV-26 


1180 


IV-32 


688 


I IV-41 


• 


1V-53 


61 


IV-62 




IV-69 


192 


iv-80 


701 


lv-82 


< ere? 

196 


IX-10 




IX-12 




IN/ 0<1 

IX-38 


757 


IV Oft 

IX-39 


70O 


IX-42 




IV >ffl 

IX-48 


7p4 






V 1 I 


1 IOO 


V-32 




V-39 




V-55 


77 


V-80 


726 


V-94 




VI-07 


93 


VI-34 




Vl-41 




VI-48 


891 


VW9 




VI-52 




VI-55 


897 


VI-65 




VI-70 


108 



Clone ID 


Sequence ID 


VIt72 


- 


VI-78 


- 


VI-84 


- 


VJH)3 


1196 


VIM5 


1199 


VH-32 


571 


Vll-39 


576 


VII-47 


1200 


VII-48 


1201 


VII-60 


- 


VII-73 


601 


Vll-77 


1203 


VII-90 


612 


V1II-20 


628 


VIII-29 


635 


VIII-30 


636 


VIH-31 


637 


VIU-39 




VIIM4 


- 


Vlll-46 


649 


VJIM8 


651 


VIII-66 


665 


VIII-74 I 


— 


VHI-76 


675 


X-04 




X-07 


808 


X-15 { 


814 j 


X-29 


821 


X-34 


: 


X-35 j 




X-54 


837 


v ce 


ojS 


Y JJQ 

A-otJ 


ICAJt 


| X-72 


849 


X-94 


860 


XI-07 




XM3 


1209 


XI-50 




XI-58 




XI-81 


1212 


XII-07 


1213 


XII-17 




XII-26 




XII-27 




XII-31 




XII-32 




XII-35 


1214 
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Clone ID 


Sequence ID 


XII-36 


- 


Xll-52 


- 


XII-59 


1216 


XIIM9 


1219 


XIH-29 


- 


XIM-52 


939 


XIII-62 


- 


XIH-84 




Xlll-92 


1221 


XV-1B 


- 


XV-22 


1099 


XV-24 




XV-25 


1224 


XV-28 


- 


XV-34 


• - 


XV-42 


- 


XV-6B 




XV-74 


• 


XV-93 




XV-94 


- 


XV-96 




XVI-36 


1056 


XVi-53 


1230 


XVl-59 


- 


XVI-66 


1074 


XVi-76 


1083 


XVI-77 


1084 


XVII-07 


- 


XVH-08 


- 


XVIM7 


- 


XVH-28 


• 


XVH-29 


• - 


XVII-31 


1139 


XVII-36 


- 


XVII-39 


- 


XVII-40 


1231 


XVII-48 


1148 


XVII-55 




XVII-58 




XVII-67 




XVII-72 




XVII-76 


1160 


XVII-82 




XVII-87 


1165 


XVII-95 


1172 
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Table 2t» 

List of sequences of probes informative for breast cancer 

Please see the note at the bottom of Table 1. Some sequences are missing. 



done id 




1-13 


1331 


1-14 


1178 


I-24 


308 


I-25 


309 


I-28 


310 


I-30 


1180 


I-37 


1440 


I-42 


1332 


!-48 


321 


I-54 


1181 


I-60 


327 


I-72 


1335 


1-81 


338 


I-82 


339 


I-86 


1336 


1-88 


1182 


1-95 


1337 


11-02 


360 


11-03 


361 


11-06 


364 


U-07 


365 


11-10 


368 


11-21 


378 


II-23 


380 


II-24 


381 


II-25 


382 


II-27 


384 


II-33 


390 


ll-o4 


oy i 


11-41 


397 


II-42 


398 


II-46 


401 


II-47 


1338 


II-48 


403 


II-52 


406 


U-57 


411 


H-58 


412 


II-59 


413 


U-60 


414 


11-61 


415 


II-62 


416 


II-64 


418 
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11-67 


421 


11-69 


423 


11-70 


424 


11-74 


428 


11-80 


434 


11-82 


436 


11-84 


438 


11-87 


441 


11-88 


442 


11-96 


450 


1(1-01 


452 


IH-02 


453 


III-06 


458 


III-08 


460 


111-12 


463 


111-13 


464 


111-17 


1344 


111-18 ! 


469 


HI-20 


1183 


111-21 


471 


lH-23 


473 


11I-24 


474 


IH-25 


475 


Ul-26 


476 


UI-27 


477 


MI-28 


478 


III-29 


479 


III-32 


482 


III-33 


483 


HI-35 


485 


ltl-39 


487 


HI-40 


488 


IH-42 


489 


III-45 


492 


W-46 


493 


UI-47 


494 


HI-48 


495 


Ul-56 


503 


lli-57 


504 


IH-58 


505 


III-59 


506 


111-61 


507 


IH-62 


508 


m-63 


509 


III-64 


510 


lli-66 


512 


lH-67 


513 


III-70 


515 


UI-74 


518 


Ul-75 


519 


III-78 


521 
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m-80 


523 


111-81 


524 


III-82 


1348 


HI-85 


526 


III-86 


527 


III-88 


529 


III-89 


530 


III-92 


1351 


III-93 


532 


III-95 


534 


III-96 


1352 


IV-04 


682 


IV-13 


683 


IV-14 


684 


IV-15 


1185 


IV-17 


685 


IV-23 


1353 


IV-26 


1186 


IV-31 


687 


IV-32 


688 


IV-35 


1355 


IV-37 


96 


IV-38 


689 


IV-42 


691 


IV-43 


1239 


IV-47 


693 


IV-53 


61 


IV-61 


696 


IV-64 


697 


IV-69 


192 


IV-72 


699 


IV-80 


701 


IV-82 


196 


IV-85 


702 


IV-93 


1360 


IV-96 


705 


IX-10 


736 


IX-12 


738 


IX-13 


739 


IX-24 


747 


IX-38 


757 


IX-39 


758 


IX-48 


764 


IX-50 


766 


IX-56 


768 


lX-62 


773 


IX-65 


776 


IX-72 


782 


IX-77 


785 


IX-91 


796 


IX-96 


801 
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V-01 


1361 


V-03 


706 


\A-04 


707 


V-07 

V \J 1 


708 


V-08 


709 


U I -I 

V 1 1 


1 188 




711 


\/-1 7 

V"* I / 


1364 


\/-9A 

V ~ 


714 


V &*\J 


1365 


\/-9ft 


1189 


v-oo 


1366 


v-OO 


1 190 

1 1 3U 




1 109 


V-4 I 


71 ft 


V/-4.7 


1 368 


v-**y 


1 ^RQ 


v-oo 


77 


\/ A7 
v-O / 


7?n 


v-OO 


1 370 


v-o I 


721 


\/_R4 


722 


v-oo 


1371 


V/-fift 
v-oo 


1448 


\/ 71 
V- / I 




V-/ *r 


724 


\/ 7^ 
V- f o 


1 V7? 


v-ou 


726 


\/ on 


1374 


V rUO 


864 


\/i-04 


865 


VI-07 

V 1 w / 


93 


X/U08 


867 




1378 


V 1 1 ^- 


869 


V I 1 o 


870 


V 1 1 *T 


871 


Vr lu 


873 


VI- I %7 


875 


VI-20 

V 1 fcU 


876 


V I 1 


1380 


Vl-23 

V I ^_ o 


878 


VI-24 


879 


V 1 £-\J 


1 192 


Vl-26 


881 


Vt-32 


885 


VI-39 


887 


VI-43 


1382 


VI-44 


1193 


VI-45 


889 


VI-48 


891 



WO 2004/046382 



PCT/GB2003/005 102 



VI-49 


892 


VI-50 


893 


Vi-53 


895 


VI-55 


897 


VI-58 


899 


VI-66 . 


903 


VI-67 


904 


VI-70 


108 


VI-71 


1387 


VI-74 


905 


VI-75 


906 


VI-76 


907 


VI-77 


110 


VI-79 


1389 


VI-80 


908 


VI-85 


910 


VI-87 


911 


VI-88 


912 


VI-90 


1390 


VI-93 


1391 


VI-95 


915 


VN96 


1392 


VII-02 


1195 


VII-03 


1196 


VII-06 


1394 


VII-08 


1197 


VII-09 


1198 


VII-10 


1395 


VII-11 


1396 


VIMS 


1199 


VII-17 


560 


VIM 9 


562 


Vll-21 


564 


VII-22 


565 


VII-23 


566 


VII-24 


567 


VII-25 


1397 


VII-26 


250 


VII-27 


568 


VII-29 


570 


VII-32 


571 


VII-33 


572 


VII-36 


575 


VII-39 


576 


VII-41 


578 


VII-42 


579 


VII-43 


580 


VII-46 


583 


VII-47 


1200 


Vll-48 


1201 


VII-49 


585 



WO 2004/046382 



PCT/GB2003/005102 



VI 1-54 




VI 1-57 




VI 1-58 


coo 
592 


VI 1-59 




VI 1-62 


tZQA 


VI 1-63 




VI 1-64 


HOC 

59o 


VI 1-66 


coo 


VI 1-67 ! 


i oyy 


VI 1-72 


con 

ouu 


VI 1-73 


601 


\ /i i -7-7 
VI 1-77 


120o 


VI 1-80 


£> nn 
605 


VI 1-82 


60/ 


VI 1-8 6 


<i A O 

1453 


VI 1-87 


610 


VII-90 


cio 
612 


VI 1-91 


613 


Vll-92 


^ a a 

614 


VH-93 


615 


Vll-96 


617 


VIII-09 


^ A Q 
610 


VUI-10 


619 


VIII-13 


622 


VMI-16 


624 


VIII-20 


628 


VIII-21 


629 


Vlll-22 


A A CC 

1 455 


VIII-23 




VIII-24 


DOT 


VIII-25 


COO 


VI 11-26 


•1 yl CR 


\ /i 11 0*7 
VIII-27 




% it 11 0 0 
VI 11-28 


Oo4 


VIII-29 


££OC 


\ /1 1 1 on 

VIII-30 


DoO 


% /1 11 0 «i 
Vlll-31 


CQ7 

0*5 f 


VIII-32 


DoO 


VIII-33 


con 


\ /l 1 1 O .4 

VNI-34 


*i on>i 


VIH-38 


D4o 


V #111 Jf\ 

VIII-40 


D44 


VIII-41 


e/c 
D4D 


win ii e 

Vlll-46 


D4y 


\ /til ilO 




VIII-55 


656 


VIII-57 


658 


VIII-59 


660 


VIII-60 


661 


VIII-61 


1205 


VIII-64 


663 
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VI 11-66 


665 


VIII-73 


672 


VIII-74 


673 


VIH-76 


675 


VIII-80 


679 


X-07 


808 


X-15 


814 


X-20 


817 


X-29 


821 


X-34 


825 


X-46 


833 


X-54 


837 


X-56 


839 


X-68 


1207 


X-72 


849 


X-73 


1208 


X-94 


860 


Xl-13 


1209 


XI-37 


1460 


XI-43 


1210 


XI-67 


1211 


XI-81 


1212 


Xll-07 


1213 


XII-35 


1214 


Xll-36 


1215 


XII-59 


1216 


XII-65 


1028 


XII-92 


1217 


Xlll-03 


917 


XIII-04 


1218 


XIII-19 


1219 


Xfll-24 


926 


XIII-51 


938 


XIII-52 


939 


XHI-67 


947 


XIII-69 


949 


XIII-88 


1220 


XM-92 


1221 


XV-22 


1099 


XV-24 


1101 


XV-25 


1224 


XV-42 


1108 


XV-62 


1226 


XV-64 


1118 


XV-84 


1125 


XVI-19 


1228 


XVI-36 


1056 


XVI-53 


1230 


XVI-60 


1071 


XVl-66 


1074 


XVI-74 


1081. 
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XVI-76 


1 083 


XVI-77 


1084 


XVII-31 


1139 


XVII-40 


1231 


XVII-48 


1148 


XVII-76 


1160 


XVIl-87 


1165 


XVH-95 


1172 
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Table 3 



List of informative probes (Clone ID) selected for breast cancer diagnosis based 
on their occurrence criterion during variable selection 



Occurrence* 


Clone ID 


100% 


XI-8,XVI-66,VnT-66,XVI.59,Vn-03,Xin-19,Xn-35,X-35,Xl- 

50^-26,IV-53^an-29,Xin-62 7 I-30,lU-06^CV-22^CV-94,Vn- 

15,Vn-39,lX-39 J XVn-39jm-40,Vn-32 


90% 


I-52,VI-65,VI-34 > rV-62,XV-34,XVll-58,V-ll,VI-78pai-36,Xin- 

92,Vm-29,XVI-53,XVI-77 1 XI-13^an-84,iV-14 > Xn-31,V-80 > Vn- 

48pCV7I-29,XVn-72 


80% 


111-60, Vm-74,K-12,X-O4,Xm-52,Vm-30,lX-38 


70% 


VI-49,X-29 ) Vm-48 


60% 


TV-82 JX-1 0,VI-52,X-68, V! 1-77 


50% 


IV-15 


40% 


XV-28,U-70,V-55 


30% 


XVII-17,XVn-67 


20% 


XI-58pCVI-36,VTn-39 > VIU-44,in-61 > lV-69,XV-68,X-72 


10% 


IX-42,lX-77^X-94,XV-96>CVn-55 


5% 


Xn-59,XVT-76,I-54,X V- 1 8, V-94,X-54,VI-07, Vn-47,XVU- 
31,XVn-87,XVn-48 


In at least one model 


11-41 ,VT-41 ,m-57,lU-89, VH-73XV-25 JV-26.X-34.1V-41, VU- 
9O^CV-42,XVn-82 J Xn-27,VUl-204-28,Vn-6O > Vin-76,Tn-20 I Vl- 
84,XI-07,XVII-28,XII-1 7,XVTI-36,XlI-52 ,XVH-76,Vm-46,VI- 
70,XV-74,XV.93,Vin-31,n-87,V-39 ,VJ-53,X-07,X-15,XII- 
07,XVn-07,XVn-08^CVn-95J-24JV-32,V-32,VT-48,VI-72,IV- 
80,IX-48^:-56,XV-24,Xn-32,XVU-40 



*100% = Genes appealing in al! the 75 cross validated models; 90% - Additional genes 
appealing in at least 68 out of 75 cross validated models; 5% - Additional genes appearing in 
at least 4 out of 75 cross validated models and so on. 
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Table 4(\ 



1 ■ 1 M. * 

List of in 


formative probes for diagnosis of Alzheimer d 


Clone ID 


Sequence ID 




Clone ID 


Sequence ID 


1-01 


- 




HI-60 




I-02 


- 




III-63 


509 


1-13 


- 




JH-68 




1-21 


- 




III-74 


51 e ~ 


I-34 


313 




IH~80 


523 


I-37 


- 




IN-82 




I-42 


- 




III-85 


j 526 


I-58 


326 




Ul-92 




1-71 


_ 




lfl-96 




1^72 






IV-23 




1-86 






IV-26 




1-95 


m 




tV-29 




11-03 


361 




IV-31 


687 


11-05 


363 




IV-34 




11-06 


364 




IV-35 




H-10 


368 




IV-45 




II-24 


381 




IV-80 


701 


11-25 


382 




IV-82 




11-26 


383 




IV-93 




H-33 


390 




V-01 


_ 


11-34 


391 




V-02 




11-42 


398 




V-03 


70S 


11-47 


- 




V-04 


707 


11-57 


411 




V-06 


- 


11-61 


415 




V-07 


708 


II-69 


423 




V-12 


711 


II-75 


429 




V-15 




U-83 


- 




V-17 


- 


II-84 


438 




V-21 


j 


H-88 


442 




V-25 


- 


II-90 


- 




V-35 


- 


II-94 


448 




V-42 




1II-02 


453 




V-43 


- 


I»h05 


- 




V-47 


- 


IH-06 


458 




V-49 




III-08 


460 




V-52 




IIMO 






V-54 




III- io 


464 




V-58 




IIM5 






V-59 




IIM7 






V*65 




III-23 


473 




V-68 




HI-26 


476 




V-71 




III-35 


485 




V-75 




III-39 


487 




V-79 




III-43 


490 




V-80 


726 


III-44 


491 




V-90 




IH-53 


500 




V-91 




111 56 


503 




V-92 
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Clone ID 


Sequence ID 


VI-02 




VI-04 


865 


VI-09 




VI-10 


_ 


VI-12 


869 


VI-14 


871 


VI-17 




Vl-20 


876 


VI-21 


m 


VI-23 


878 


VI-41 




VI-42 




VI-43 


• 


Vl-44 




VI-48 


891 


VI-49 




VI-50 


893 


VI-53 


895 


Vl-71 




VI-74 


905 


VI-76 


907 


VI-78 




VI-79 




VI-87 


911 


VI-88 


912 


VI-90 




Vl-93 




VI-95 


915 


VI-96 




Vll-02 




VII-03 




VII-06 




VII-10 


- 


VIM1 




VII-19 


562 


VH-21 


564 


VII-25 


• 


VII-36 


575 


VII-42 


579 


VII-43 


580 


VII-46 


583 


VII-59 


593 


VII-63 


595 


Vll-66 


59a 


VII-67 " 


- 


VII-72 


600 


VII-73 


601 


Vil-75 


- 


VI-02 




VI-04 


866 


VI-09 




VI-10 




VI-12 


B73 


VI-14 


875 


VI-17 
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Clnnp ID 

It? II-/ 


otjijuerice il/ 




DlO 






VII Ml 








i VIIM13 




VI 11-06 






DIO 


« lit 1 w 




VIIU1 R 

V HI* 1 O 














L wS^r 






VIII 


coo 


VII 

w 111 


can 


Vllf^ll 
VIH**f 1 


645 


V 1 1 X~*+£L 


646 


VIH-*W> 


651 


1AII CO 

Vlil-oo 




W|||_£»yi 

Vllr©4 


663 






\7IHJ57 
Vllr-©/ 


666 


V|||-/0 




VIH-82 




VUI-B3 




VIII-85 




VIII-87 




vm-91 




L VIII-92 




L VIII-93 




VIII-95 
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Table 4 b 

List of sequences of probes informative for Alzheimer disease 
Please see note to Table 1 



Clone ID 


Sequence JD 


1-09 


298 


1-10 


299 


1-15 


300 


1-16 


301 ! 


1-17 


302 


1-19 


304 


1-20 


305 


I-22 


306 


I-23 


307 


I-24 


308 


I-25 


309 


I-28 


310 


1-31 


311 


I-32 


312 


I-34 


313 


I-38 


314 


I-39 


315 


I-40 


316 


I-44 


317 


I-45 


318 


I-46 


319 


I-47 


320 


I-48 


321 


I-49 


322 


I-53 


323 


I-56 


324 


I-57 


325 


i~58 


326 


I-60 


327 


1-64 


328 


I-67 


330 


I-69 


331 


1-71 


332 


I-72 


333 


I-73 


334 


I-77 


335 


I-79 


336 


I-80 


337 


1-81 


338 


I-82 


339 


Vl-02 


340 
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VI-03 


341 


VI-04 


342 


VI-06 


343 


VI-07 


344 


Vl-08 


345 


VI-09 


346 


Vi-11 


347 


VJ-18 


348 


VI-19 


349 


VI-20 


350 


VI-21 


351 


VI-22 


352 


VI-25 


353 


VI-26 


354 


VI-27 


355 


VI-31 


356 


Vl-33 


357 


V»-35 


358 


VI-48 


359 


II-02 


360 


II-03 


361 


II-05 


363 


II-06 


364 


II-Q7 


365 


II-08 


366 


II-09 


367 


IMO 


368 


11-11 


369 


11-12 


370 


11-13 


371 


U-14 


372 


11-15 


373 


11-16 


374 


11-17 


375 


11-18 


376 


II-20 


377 


11-21 


378 


1 1-22 


379 


11-23 


380 


11-24 


381 


11-25 


382 


11-26 




11-27 


384 


11-28 


385 


11-29 


386 


11-30 


387 


11-31 


I 388 


II-32 


389 


li-33 


390 


II-34 


391 


II-35 392 
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11-17 

1 1 O / 


393 


11-38 

1 1 O tj 


394 




«J9tJ 


II *tu 




11-41 


1Q7 




1Q8 


11-41 


1QQ 


11-44 


4nn 


II 4R 


4m i 


II 47 


4fl9 


1 1— o 


4ft1 




404 


ii 


405 i 


II ^9 


4HR 


I ICO 


407 


II «J*r 


4flft 


11-55 


4HQ 

*tU5» 


11 5R 


41 n 


II ^7 


411 


11 


41 9 


II 5Q 


411 


ti fin 


414 


it fi-i 


41 5 


ILR9 


41R 


II-R1 


417 


II-R4 


41 ft 


1 1 £5 C 


41 Q 


H-RR 


490 


II-R7 


491 


M fift 
1 J-DO 


499 


II RQ 




II- / u 


494 


N-71 

II-/ 1 


425 


11-72 


426 


11-71 


427 


11-74 

II / *T 


428 


11-75 
ii § *j 


42Q 


II-7R 


410 


11 77 
II-/ / 


411 
f O I 


11-7 ft 


412 


II-7Q 


411 
to J 




414 


ILR1 

ll-O 1 


415 


Il-ft2 


416 


11-83 


437 


11-84 


438 


11-85 


439 


11-86 


440 


11-87 


441 


11-88 


442 


11-89 


443 
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11-90 


444 


11-91 


445 


II-92 


446 


II-93 


447 


II-94 


448 


II-95 


449 


H-96 


450 


111-01 


452 


III-02 


453 


III-03 


454 


HI-04 


455 


III-05 


457 


ill-06 


458 


III-07 


459 


Ill-OS 


460 


HI-09 


461 


111-11 


462 


111-12 


463 


111-13 


464 


Ul-14 


465 


III-15 


466 


IIM6 


467 


111-17 


468 


111-18 


469 


111-19 


470 


111-21 


471 


UI-22 


472 


III-23 


473 


ID-24 


474 


III-25 


475 


III-26 


476 


III-27 


477 


III-28 


478 


III-29 


479 


111-31 


481 


III-32 


482 


III-33 


483 


III-34 


484 


lil-35 


485 


III-37 


486 


III-39 


487 


ill-40 


488 


III-42 


489 


III-43 


490 


III-44 


491 


III-45 


492 


lli-46 


493 


IH-47 


494 


III-48 


495 


III-49 


496 


lil-50 


497 
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III C A 

111-51 




III CO 

III-52 


A QO 


III CO 

IH-53 


oOU 


IH-54 


501 


III-55 


502 


HI-56 


oOo 


IH-57 


504 


lli-58 


505 


III-59 


oOo 


111-61 


OU f 


III-62 


crvo 
oOo 


III-63 


COO 

509 


lil-64 


510 


III-65 


CA A 

51 1 


IH-66 


C *l O 

512 


III-67 


C A O 

51 3 


I1I-69 


CH i4 

514 


IIi-70 


C«f c 

51 5 


111-71 


516 


III-73 


517 


III-74 


C A O 

518 


III-75 


519 


ill-77 


520 


IU-78 


521 


III-79 


C OO 

522 


III-80 


523 


111-81 


524 


UI-83 


525 


IH-85 


coo 

526 


III-86 


C OT 

527 


Ul-87 


528 


III-88 


529 


III-89 


con 


111-91 


531 


IH-93 


coo 

532 


III C\ A 

IH-94 


coo 

boo 


ill nC 

III-95 


COyf 

Oo4 


in n/? 

lU-96 


OOO 


VII-02 


54/ 


VII-03 


rift 

548 


VII-04 


E ACi 

549 


Vll-05 


r- c O 

550 


VU-06 


CCA 

551 


VII-07 


r* co 

552 


VII-08 


553 


vii-uy 


JJ't 


VII-10 


555 


VIM1 


556 


Vll-12 


557 


VIM 4 


558 


VII-15 


559 
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VI 1-1 7 


560 


VI 1-1 8 


561 


VIM 9 


562 


VII-20 


563 


VI 1-21 


564 


VII-22 


565 


VII-23 


566 


VII-24 


567 


VII-27 


568 


VII-28 


569 


VII-29 


570 


VII-32 


571 


VII-33 


572 


VII-34 


573 


VII-35 


574 


VII-36 


575 


VII-39 


576 


VII-40 


577 


VII-41 


578 


VII-42 


579 


VII-43 


580 


VII-44 


581 


VII-45 


582 


VII-46 


583 


VII-48 


584 


VII-49 


585 


VII-50 


586 


VII-52 


587 


VII-53 


588 


VII-54 


589 


VII-55 


590 


VII-57 


591 


VII-58 


592 


VII-59 


593 


VII-62 


594 


VII-63 


595 


VII-64 


596 


VII-65 


597 


VII-66 


598 


VII-71 


599 


VII-72 


600 


VII-73 


601 


VII-74 


602 


VII-76 


603 


VII-77 


604 


VII-80 


605 


VII-81 


606 


VII-82 


607 


VH-83 


608 


VII-84 


609 


VII-87 


610 
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VH-89 


611 


VII-90 


612 


VII-91 


613 


VII-92 


614 


VII-93 


615 


VII-94 


616 


VII-96 


617 


Vlll-09 


618 


VJIMO 


619 


VIIM 1 


620 


VIIM 2 


621 | 


VIIM3 


622 


VIIM5 


623 


VIIM6 


624 


Vlll-17 


625 


VIIM8 


626 


VIII-19 


627 


VIII-20 


628 


VIII-21 


629 


VIII-23 


630 


VIII-24 


631 


Vlll-25 


632 


Vlli-28 


634 


VIII-29 


635 


VIH-30 


636 


Vlll-31 


637 


VIII-32 


638 


VIII-33 


639 


VIII-34 


640 


VIII-36 


641 


VIII-37 


642 


VIII-38 


643 


VHI-40 


644 


VIII-41 


645 


Vlil-42 


646 


VIII-43 


647 


VIII-45 


648 


VIII-46 


649 


VIII-47 


650 


VIII-48 


651 


VIII-50 


652 


VIII-51 


653 


VIII-53 


654 


VIII-54 


655 


VIII-55 


656 


VIII-56 


657 


VIII-57 


658 


VIII-58 


659 


VUI-59 


660 


VIII-60 


661 


VIII-61 


662 
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VI 11-64 


663 


VIII-65 


664 


VIII-66 


665 


VI 11-67 


666 


VIII-68 


667 


VIII-69 


668 


VIII-70 


669 


VIII-71 


670 


VIII-72 


671 


VIII-73 


672 


VIII-74 


673 


VIII-75 


674 


VIII-76 


675 


VIII-77 


676 


VIII-78 


677 


VIII-79 


678 


VIII-80 


679 


IV-02 


681 


IV-04 


682 


IV-13 


683 


IV-14 


684 


IV-17 


685 


IV-28 


686 


IV-31 


687 


IV-32 


688 


IV-38 


689 


IV-40 


690 


IV-42 


691 


IV-44 


692 


IV-47 


693 


IV-55 


694 


IV-56 


695 


IV-61 


696 


IV-64 


697 


IV-65 


698 


IV-72 


699 


IV-73 


700 


IV-80 


701 


IV-85 


702 


IV-93 


703 


IV-95 


704 


IV-96 


705 


V-03 


706 


! V-04 


707 


V-07 


708 


V-08 


709 


V-09 


710 


V-12 


711 


V-18 


712 


V-20 


713 


V-24 


714 
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V-37 


716 


V-40 


717 


V-41 


718 


V-48 


719 


V-57 


720 


V-61 


721 


V-64 


722 


V-65 


723 


V-74 


724 


V-80 


726 


V-81 


727 


V-87 


728 






VI-13 


870 


VI-14 


871 


VM6 


873 


VI-23 


878 


Vl-24 


879 


VI-28 


883 


Vl-32 


885 


VI-38 


886 


VI-39 


887 


VI-45 


889 


VI-46 


890 


VI-49 


892 


VI-50 


893 


VI-52 


894 


VI-53 


895 


VI-54 


896 


VI-55 


897 


VI-57 


898 


VI-58 


899 


VI-63 


900 


VI-65 


902 


VI-66 


903 


VI-67 


904 


VI-74 


905 


VI-75 


906 


VI -76 


907 


VI-80 


908 


VI-81 


909 


VI-85 


910 


VI-87 


911 


VI-88 


912 


VI-91 


913 


VI-94 


914 


VI-95 


915 


VI-96 


916 


M3 


1177 


M4 


1178 


I-30 


1180 
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1-54 


1181 


1-88 


1182 


111-20 


1183 


1 V-1 5 


1185 


IV-26 


1 186 


IV-62 


1187 


V-11 

V 1 1 


1188 


V-28 


1 189 


V-38 


1190 


V-45 


1191 


VI-44 


1193 




1200 


1-42 


1332 


1-52 


1333 


l"OU 


1 336 




1337 


ni-10 


1342 


III-60 


1347 


III-82 


1348 


III-92 


1351 


IV-23 


1353 


IV-34 


1354 


IV-35 


1355 


IV-41 


1356 


IV-45 

1 V "T»J 


1357 


IV-82 


1359 


V-01 

v \j i 


1361 


V-02 


1362 


V-06 


1363 


V-1 7 


1364 


V-25 


1365 


V-35 


1366 


V-42 


1367 


V-47 


1368 


V-49 


1369 


V-58 


1370 


V-75 


1372 


V-79 


1373 


V-90 


1374 


V-91 


1375 


V-94 


1376 


VI-10 


1379 


VI-41 


1381 


VI-43 


1382 


VI-71 


1387 


VI-72 


1388 


Vi-79 


1389 


VI-90 


1390 


VI-93 


1391 


VII-25 


1397 


Vlf-60 


1398 
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VII-67 


1399 


Vlil-22 


1403 


VI 11-26 


1404 


VIII-39 


1405 


VI 11-44 


1406 


1-37 


1440 


V-32 


1445 


V-52 


1447 


V-68 


1448 


V-92 


1449 


VI-42 


1450 


VI-78 


1452 


VI 1-86 


1453 


VII-88 


1454 


IV-29 


1490 


V-15 


1491 


V-39 


1492 


V-54 


1493 


V-59 


1494 


V-71 


1495 
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Table 5 



Samples 



Diagnosis 


No. of women 


Normal /Benign 


42* 


DCIS 


3 


Invasive cancer 


26 



* From one woman, whole blood was collected at weeks 1,2,3,4,5 following menstruation. 
Hence, the number of unique normal/benign samples tested in the experiment is 75 



Information about women with breast cancer 



Sample 


AGE 


Stage 


Cancer type 


Size hist, 
(mm) 


Nodes 


1 


51 


II 


IDC 


20 


1/7 


2 


84 


II 


IDC 


22 


2/2 


3 


50 


I 


DCIS+ 
1 IDC 


>50 DCIS; 
5 x 14 


0/7 


4 


47 


I 


IDC 


15 


0 


5 


69 


III 


ELC g.2 + tubular 


50 + 3 


1 av 12 + 1 av 7 


o 


ou 


II 
II 


mp 




o 


7 




1 
1 


IDC 


15 


0 


8 


63 


II 


roc 


23 


0 


9 


55 


1 


IDC + DCIS 


4 


0 av 1 


10 


52 


o 


DCIS + small 
colloid carcinoma 
foci 


50 + 3 


0 


11 


60 


II 


IDC 


24 


0 


12 


54 


1 


IDC 


11 


0 


13 




0 


DCIS 


20 


0 


14 


49 


0 


DCIS 


9 


0 


15 


48 


1 


IDC 


4 


0 


16 


56 


1 


IDC 


4 


0 


17 


68 


1 


roc 


14 


0 


18 


68 


1 




7 


0 


19 


63 


1 


roc 


10 


0 


20 


45 


1 


roc 


19 


1 


21 


57 


III 


IDC 


60 


8/20 


22 


55 


II 


idc/dcis 


35+55 


0 


23 


71 


1 


IDC/extensive 
DCIS 


8 


0 


24 


56 


1 


IDC 


9 


? 
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25 


66 


II 


IDC 


26 


0 


26 


66 


I 


IDC 


15 


7 


27 


61 


I 


I IDC 


9 


? 


28 


? 


? 


7 


? 


7 


29 


65 


I 


IDC 


11 


0 



Other diseases /conditions present in the women tested 

Other diseases /conditions p resent in the women tested 



Disease/condition 



Diabetes 



Asthma 



Ulcerous col itis 
Hemochromatose 



Crohn's disease 



Fibromyalgia 



Psoraiasis 
Atopic eczema 



Rheumatism 



[Allergies 



Cancer type 


No. of women 


Breast 


3 


Colon 


2 


Stomach 


1 


Skin 


1 
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Table. % 

Some relevant features of the blood donors. B, Female donors with breast cancer N Female 
donors with suspected mammogram but no breast cancer IDC, invasive ductal carcinoma- DOS 
ductal carcinoma in situ; na, not available nd, not determined; ++, no degradation of mRNA and no' 
ribosomal contamination in the sample, +, no degradation of mRNA but ribosomal contamination in the 







AGE 


Cancer type 

/breast 
abnormality 


Size Hist 
(itifti) 


mRNA 
Quality 


1 


B1 


na 


IDC 


5 


++ 


2 


B2 


49 


DCIS 


8 


nd 


3 


B3 


54 


IDC 


I 18 


++ 


4 


B4 


59 


IDC 


12 


+ 


5 


B5 


61 


DCIS+micro 
invasive cancer 


15+1.5 


++ 


6 


B6 


55 


IDC 


12+17 


nd 


7 


B6 




IDC 


12+17 


nd 


8 


N1 


45 


Fibroadenoma 




nd 


9 


N2 


52 


na 




+ 


10 


N3 


55 


Cyst 




++ 


11 


N4 


54 


na 




++ 


12 


N5 


51 


Benign ductal 
epitelhelium 




nd 


13 


N6 


57 


Benign 




nd 


14 


N7 


50 


na 




++ 


15 


N8 


52 


na 
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List of sequence of probes informative for both alzheimer and breast cancer . 
disease 



Clone ID 


Sequence ID 


1-24 


308 


1-25 


309 


1-28 


310 


i-48 


321 


1-60 


327 


1-72 


333 


1-81 


338 


I-82 


339 


II-02 


360 


II-03 


361 


II-06 


364 


II-07 


365 


11-10 


368 


11-21 


378 


il-23 


380 


Ii-24 


381 


II-25 


382 


II-27 


384 


II-33 


390 


II-34 


391 


11-41 


397 


II-42 


398 


II-46 


401 


H-47 


402 


II-48 


403 


II-52 


406 


II-57 


411 


II-58 


412 


II-59 


413 


II-60 


414 


11-61 


415 


H-62 


416 


U-64 


418 


1I-67 


421 


II-69 


423 


H-70 


424 


II-74 


428 


II-80 


434 


1 1-82 


436 


lt-84 


438 
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11-87 


441 


11-88 


442 


11-96 


450 


111-01 


452 


III-02 


453 


III-06 


458 


III-08 


460 


111-12 


463 


111-13 


464 


! 111-17 


468 


111-18 


469 


111-21 


471 


III-23 


473 


III-24 


474 


IH-25 


475 


III-26 


476 


III-27 


477 


III-28 


478 


III-29 


479 


III-32 


482 


III-33 


483 


IH-35 


485 


III-39 


487 


III-40 


488 


III-42 


489 


III-45 


492 


III-46 


493 


III-47 


494 


III-48 


495 


III-56 


503 


III-57 


504 


III-58 


505 


III-59 


506 


111-61 


507 


III-62 


508 


1II-63 


509 


111-64 


510 


111-66 


512 


111-67 


513 


111-70 


515 


111-74 


518 


111-75 


519 


111-78 


521 


111-80 


523 


111-81 


524 


III-85 


526 


III-86 


527 


llt-88 


529 


III-89 


530 


III-93 


532 


III-95 


534 
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111-96 


535 


IV-04 


682 


IV-13 


683 


IV-14 


684 


IV-17 


685 


IV-31 


687 


IV-32 


688 


IV-38 


689 


IV-42 


691 


IV-47 


693 


IV-61 


696 


IV-64 


697 


IV-72 


699 


IV-80 


701 


IV-85 


702 


IV-93 


703 


IV-96 


705 


V-03 


706 


V-04 


707 


V-07 


708 


V-08 


709 


V-12 


711 


V-24 


714 


V-41 


718 


V-57 


720 


V-61 


721 


V-64 


722 


V-65 


723 


V-74 


724 


V-80 


726 


VI-03 


341 


VI-04 


342 


VI-07 


344 


VI-08 


345 


VI-09 


346 


VI-12 


869 


VI-14 


871 


VI-19 


349 


VI-20 


350 


VI-21 


351 


VI-23 


878 


VI-25 


353 


VI-26 


354 


VI-48 


359 


VI-50 


893 


VI-53 


895 


VI-74 


905 


VI-76 


907 


VI-87 


911 


VI-88 


912 


VI-95 


915 
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Vll-02 


547 


VI 1-03 


548 


VI 1-06 


551 


VI 1-08 


553 


VI 1-09 


554 


VI 1-10 


555 


VI 1-11 


556 


VII-15 


559 


VI 1-1 7 


560 


VI 1-1 9 


562 


VI 1-21 


564 


VII-22 


565 


VII-23 


566 


VII-24 . 


567 


VII-27 


568 


VII-29 


i 570 


VII-32 


571 


VII-33 


572 


VII-36 


575 


VII-39 


576 


VII-41 


578 


VII-42 


579 


VII-43 


580 


VII-46 


583 


VII-48 


584 


VII-49 


585 


VII-54 


589 


VII-57 


591 


VII-58 


592 


VII-59 


593 


VII-62 


594 


VII-63 


595 


Vll-64 


596 


VII-66 


598 


VII-72 


600 


VII-73 


601 


VII-77 


604 


VII-80 


605 


VII-82 


607 


VII-87 


610 


VII-90 


612 


VII-91 


613 


VM-92 


614 


VII-93 


615 


VII-96 


617 


VIII-09 


618 


VIII-1 0 


619 


VIII-13 


622 


VIII-16 


624 


VIII-20 


628 


VIII-21 


629 
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VIII-23 


630 


VI 11-24 

VIII 


631 


VI ll-P 5 


632 


VM1-28 

will CJ 


634 


V 111 


635 


VIII OU 


636 


VHi O 1 




VNI-^9 
V 1 IIOa 


638 


viii-^3 

VIII 


639 


VIII-34 


640 


V III wU 


643 


VI 1 1-4-0 

VIII *Tw 


644 


VI 11-41 

VII 1 *T 1 
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V 1 II fv 
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U J 1 


v I HJ«J 
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V 1 II o / 
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Nucleotide sequences 

Sequence ID - 93 n t : 405 

GGATCCTGTGGCCCACAGAGCTGCCCCAGCAGACGCTCCGCCCCACCCGGTGATGG 

AGCCCCGGGGGGACAATCGTGCCTGGGGAGGAGCAGGGTACAGCCCATTCCCCCAG 

CCCTGGCTGACCTGGCCTAGCAGTTTGGCCCTGCTGGCCTTAGCAGGGAGACAGGG 

GAGCAAAGAACGCCAAGCCGGAGGCCCGAGGCCAGCCGGCCTCTCGAGAGCCAGAG 

CAGCAGTTGAATGTAATGCTGGGGACAGGCATGCTGCCGCCAGTAGGGCGGGGACC 

CGGACAGCCAGGTGACTACCAGTCCTGGGGACACACTCACCATAAACACATCCCGA 

GGCAGGACAGATCGGGGAAGGGGTGTGTACCAGGCTATGATTTCTCTTGCATTAAA 
ATGTATTATTATT 

Sequence ID - 108 n t : 550 ' 

GGCTTTGACAGAGTGCAAGACGATGACTTGCAAAATGTCGCATCTGGAACGCAACA 

TAGANACCATCATCAACACCTTCCACCAATACTCTGTGAAGCTGGGGCACCCAGAC 

ACCCTGAACCAGGGGGAATTCAAAGAGCTGGTGCGAAAAGATCTGCAAAATTTTCT 

CAAGAAGGAGAATAAGAATGAAAAGGTCATAGAACACATCATGGAGGAGCTGGACA 

CAAATGCAGACAAGCAGCTGAGCTTCGAGGAGTTCATCATGCTGATGGCGAGGCTA 

ACCTGGGCCTCCCACGAGAAGATGCACGAGGGTGACGAGGGCCCTGGCCACCACCA 

TAAGCCAGGCCTCGGGGAGGGCACCCCCTAAGACCACAGTGGCCAAGATCACAGTG 

GCCACGGCCACGGCCACAGTCATGGTGGCCACGGCCACAGCCACTAATCAGGAGGC 

CAGGCCACCCTGCCTNTACCCAACCAGGGCCCCGGGGCCTGTTATGTCAAACTGTC 

TTGGCTGTGGGGCTAGGGGCTGGGGCCAAATAAAGTCTCTTTCTCC 

Sequence ID 110 

ACGAAGACAGACATCTGTGGAATGATTCACATCCTCTCAAGTTAGGAGGATGGAGG 
CCTGCTTCATTAAGAAGCTGGGGGTAGGGTGGGGGTGGGGAGAACACTTAACAACA 
TGGGGACCAGTCAGGGGAATCCCCTTATTTCTGTTTTGCATATGAGGAACCCTAGA 
GCAGCCAGGTGAGGCTCTCTAGTTTAATAAAAATCATGGAAAGACTCTTAATGCAG 
ACTCTTCTTAAGTGTTAATAGGGATTTTTTCAGCTTATTTTGGTTGCAGTTTCCAA 
TTTTTAAAAATGTTGAGGTAATCTTTCCCACCTTCCCAAACCTAATTCTTGTAGAT 
GCATTAGTGTTGAACCAATGCTTTCTCATGTCTCAATTCTTTGTATATGCATTCTT 
TTCAGATGTATTAAACAAACAAAAACCCTTC 

Sequence ID - 192 n t : 286 

CCGGTAATAGAATAGAAAAGGGAGAGTGTCTTCATGCAATGTGGCATCCTGGATTG 

GGTCTCGlsnSfAGAAAAACAGGACATTAGTGGGAAAATTGGAAATCTGAAAAAAGTCT 
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GAATTTTAGTTAATATACCAATTTCAGTCTCTTGGTTTTGACAGATGTACCATGGT 

GATGTAAGATGTTGACCTTGGGGTAGGCTGGGTGAAGGGTATACAGGAACTCTTTG 

TACTATCTCTGCAACTTCTCTGTAAATCTAGTATCATTCCAAAATAAAAGTTTATT 
TAATTT 

Sequence ID 250 

GTGGAAGTGACATCGTCTTTAAACCCTGCGTGGCAATCCCTGACGCACCGCCGTGA 
TGCCCAGGGAAGACAGGGCGACCTGGAAGTCCAACTACTTCCTTAAGATCATCCAA 
CTATTGGATGATTATCCGAAATGTTTCATTGTGGGAGCAGACAATGTGGGCTCCAA 
GCAGATGCAGCAGATCCGCATGTCCCTTCGCGGGAAGGCTGTGGTGCTGATGGGCA 
AGAACACCATGATGCGCAAGGCCATCCGAGGGCACCTGGAAAACAACCCAGCTCTG 
GAGAAACTGCTGCCTCATATCCGGGGGAATGTGGGCTTTGTGTTCACCAAGGAGGA 
CCTCACTGAGATCAGGGACATGTTGCTGGCCAATAAGGTGCCAGCTGCTGCCCGTG 
CTGGTGCCATTGCCCCATGTGAAGTCACTGTGCCAGCCCAGAACACTGGTCTCGGG 
CCCGAGAAGACCTCCTTTTTCCAGGCTTTAGGTATCACCACTAAAATCTCCAGGGG 
CACCATTGAAATCCTGAGTGATGTGCACTGATCAAGACTGG 

Sequence ID 299 

CAGCGCAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAAG 
TTTTTTTCTCTTTGAAAGATAGAGATTGNTACAACTACTTAAAAAATATAGTCAAT 
AGGTTACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAG 
ATTTTAAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAAA 
AGGTTTCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATG 
TATTTAAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAGA 
AGGGCAATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTT 
AAAAGTTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAAC 
CGAAGGTGATTAAAAGACCTTGAAATCCATGACGCANGGAGAATTGCGCATTTAAA 
GCCTAGTTACGCATTTACTAAACGCAGACGAAAATGGGAAGATTAATTGGGAGTGG 
TAGGATGAAACAATTTTGGAGAAGATAGAAG 

Sequence ID 300 

CTCAAAGGAGAAAAAAAACCTTGTAAAAAAAGCAAAAATGACAACAGAAAAACAAT 

CTTATTCCGAGCATTCCAGTAACTTTTTTGTGTATGTACTTAGCTGTACTATAAGT 

AGTTGGTTTGTATGAGATGGTTAAAAAGGCCAAAGATAAAAGGTTTCTTTTTTTTT 

CCTTTTTTGTCTATGAAGTTGCTGTTTATTTTTTTTGGCCTGTTTGATGTATGTGT 

GAAACAATGTTGTCCAACAATAAACAGGAATTTTATTTTGCTGAGTTGTTCTAAAA 
AAAAAAAAAAAAAAAAA 
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Sequence ID 3 02 

AGTAGAGACGGGGTTTCACTGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGT 

GATCCGGCCACCTCGGCCTCCCGAAAGTGCTGGGATTACAGGCGTGAGCCACGGCG 

CCCAGCCCCAGCCTGTCACTTAAACTGATAAACGACAGATTAACAGTAGAAAAATT 

TTATTTTGCATACATAATGAGGCTTCAGAAAAGAGAA.GTGAAAACCCAAGTAGGAG 

TTTAGGGCTGGGGGCTTATATACCATTTAACAAGGGGTGATAAATTGTAAGAGAAT 
AG 

Sequence ID 3 04 

TCCTTGGTTTCGATTTGTGGCAACAATCCAGTCTTTTTGTTTTTTTCAGGGATACC 

ATATGTAACAGGTGCCATTGTTACTGTAACTTTTCACACATGCCTTCAGTTTGATG 

TCAAAGTCATCATTTAGTGTAAACAGCAAGTTATCTGTTAGGCTGCACATCATGAA 

CTTTACTTTTAGAAAGTCTTATCTTTTATGCCACAGAAATAGCATTTGGCTATTAG 

TCATGGATGGCAAAGAAATTAATTTTGAGTTGTTTGGATAAAAATGTTTCAGTTGA 

CTGTAGTGTGTATTGAGAGACACTGCCAGTAAACAAACTCTCTTGGTAGGTGGAAA 

TCCCCTAGAAGTTACAGAAAATTGGGAGGAGGTGAACTTAATTAAATAACTTGAAT 

TGTTTAGACATATTCAGAGCTTCTTATGACCTTGAAGAAATCACCCAACTTCAAAA 

GACCTCGGTTTCTTCATTTGTAAAATTAGGGAGTTTGACTAGATGTGTAAATCTAG 

TTGTTAGTTAACTTCTAAGATGTAAAAACCCTCTTGTTTAACAAAAACCTACAAGA 

TCAAGTTGCTTATCTGAAATCTTTATGAATCAACACTAGTCACTAAGTCTAGCTCG 
ACC 

Sequence ID 3 06 

CTTTTCCTCCCGCTGTCCCCCACGGAGGGGACTGCTCTCCCCCGCTGCATCCTTTC 
TGTGAGGTACCTTACCCACCTCAGCACCTGAGAGGGTGAAATAGAATTCTAACCTC 
GACATTCGGGAAGTGTTTTTGAGAAGTCTCGGTCGGTAAGGGAAGTCTTCCAAGTC 
CGTGCAGCACTAACGTATTGGCACCTGCCTCCTCTTCGGCCACCCCCCAGATGAGG 
CAGCTGTGACTGTGTCAAGGGAAGCCACGACTCTGACCATAGTCTTCTCTCAGCTT 
CCACTGCCGTCTCCACAGGAAACCCAGAAGTTCTGTGAACAAGTCCATGCTGCCAT 
CAAGGCATTTATTGCAGTGTACTATTTGCTTCCAAAGGATCAGGCCCTGAGAACAA 
TGACCTTATTTCCTACAACAGTGTCTGGGTTGCGTGCCAGCAGATGCCTCAGATAC 
CAAGAGATAACAAAGCTGCAGCTCTTTTGATGCTGACCAAGAATGTGGATTTTGTG 
AAGGATGCNCATGAANAAATGGACNAGCTGTG 

Sequence ID - 308 nt : 373 

AAGTGGGTCTTGCCATCCCTGAACTGNAATCATCCCTAACATATTCATACCTGTTT 

TCATTTTAAAAGTTGGGTCAGTTTTTTTATTAGTACATGTATTTCTATCCTACTGA 
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TTTATTTGCTATATCATCTAA.TTTAGTTTGAATATTCCATAATTTACTTAATTAGT 

CCTGTATGGAGACCTAGCTCTTCTCAGTGTCTACTATTATAAACAATGCTACAGTG 

AATATTGGTGNATAAATCCATACNCACCACGTACATATCTTAAGTTCTGGAAGAGA 

TATTGCTAAACCAGAAGATAACCTGCATTTAAAATTTGACTGCTAGGGNCAGGGNC 
ACATTTAATTAAATTAGAACAANGAATGCATAATGNC 

Sequence ID 309 

CCGGAATCGCGGCCGCGTCGACGAAAATATGTGCCCTGGCCAACTCCACAGGACTA 

GTTCTAGGCAATCTGAAGGAAACCAGAAAATGTGAATTTCTCTTCCCTCAAAAAGC 

TATACTGAAGTAGTATTTAATATTCAAGTACTTGTAAATTTGCAGAACAGTACTTT 

TTAATTTGACCCATGAATTCTATTTAAATTTGTCACTTAATATTTAGCCAAGAAGC 

AAACCATCTAAAAAGATTTCTGGTTTATTTCTCCAACTCCTAATAAATAGGGTCAC 

ATATTTTTTAACTTTTTTCTAATTTGAAAAGTAATACAGGCATATGGTATTTTAAA 

AATGAAACAACACAAAGGGATATGTTTTGAAAAGTGGTCTTGCCATCCCTGAACTG 

TAATCATCCCTAACATATTCATACCTGTTTTCATTTTAAAAGTTGGGTCAGTTTTT 

TTATTAGTACATGTATTTCTATCCTACTGATTTATTTGCTATATCATCTAATTTAG 

TTTGAATATTCCATAATTTACTTAATTAGTCCTGTATGGAGACCTAGCTCTTCTCA 

GTGTCTACTATTATAAACAATGCTACAGTGAATATTGGTGNATAAATCCTACACAC 

CACGTAACATATCTTAAGTTCCTGGAAGAGATATTGCTAAACCAGAAGATAACCTG 

CATTTAAAATTTGACTGCTAGGGTCAGGGTCACATTTAAATTAAATTAGAACAAGG 

AATGCATAATGTCTTCGATAGCAATCTATTCAAGGTGCACCGTGGTCACAAAGGAA 
AGCAAAACTGTC 

Sequence ID - 310 nt:564 

CCTGGNCAGAGGCCTCTATCCTGTANTGATAATTGCCATCAAAATTGTCAAAAANG 

ATTTAATTTCTATGGGNAATAGTCCTTTTCTTAGCTTCTGCCNNTCACTTGCTTAT 

TTTTTGTGTGGGAATGGGGTTGGATAAACCAATGAACTTTATTATAAACAAATCCC 

ACCTATATCTANCAAATTTATATTTTCGGTGAAATACAGATATTTGCCTTTCTGGA 

GTANTATAGAAGCTGTCAATATGTATCTACTGTACAGTACTAAATAGTATTCATTT 

ATGAAATGAGTAGTGTTTGGGTGGCTGGGGTTAAGGAAAAATGAGACTTGGAATTG 

TAGCTTTTATCCAAGTTTTGAGTATAAATAGGGTTTTGTTTTGTTTTTTTTAACCT 

AAAAACTGAAATGCCATATAGAAAAACAGCATTGTTTTTACAGTTTGTAGTAAGTA 

ACTTTTTAAAGATTTTATCAAAAAGAATTTTGTCTATNGTGAGTAAAAGAAGTTCT 

AATAATGGCCTAATCACTGCATTTTTAAAAAACAAAGTTCAACACAAATGACATTT 
GTTT 



Sequence ID 311 
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CCTCTCCTCCATCTAAAGGCAACATTCCTTACCCATTAGTCTCAGAAATTGTCTTA 

AGCAACAGCCCCAAATGCTGGCTGCCCCCGGCCAAGCATTGGGGCCGCCATCCTGC 

CTGGCACTGGCTGATGGGCACCTCTGTTGGTTCCATCAGCCAGAGCTCTGCCAAAG 

GCCCCGCAGTCCCTCTCCCAGGAGGACCCTAGAGGCAATTAAATGATGTCCTGTTC 
CATTGG 

Sequence ID - 313 n t : 554 

CCCGGAATCGCGGCCCGCGTCGACAACAAACCTGCATGTTCTGCACATGTATCCAG 
GAACTTAAAAAAAAAAAAAGATAGTTTGTGTGTCTTAATTGAATAATAGTAGATTT 
ATAGATTAAAGATCTATGGGTTTTTAATATGGATTANAAATCTGTGGGTTTTTGAT 
ATGGATTANAAATCTGTGGGTTTTTAATATGGATTGGAAATCTGTGGGTTTTTAAT 
ATGGATTAAAAAACATCTGTGGGTTTTTAATATGGATTAAACATCTGTGGGTTTTT 
AATATGGATTAAACATCTGGGTTTTTAATATGGATTAAACATCTGTGGGTTTTTAA 
TATGGGTTAAAAATCAAAAGAAAATGAACTATTTGCTCCAGTGCAGGAAAATACAG 
GCAATACTGGATACAATTAGATGGTCAGGAGCGATAACCCGGTTGCCATTGTTTGA 
AGAAGAGAATAAGGNGCTAGCATTCCTATCCGTAGATAATTTGACAGCTAGGAAAT 
AGGGGGAGTCTTCTATGTAGTTAGTGAAGGCTAAATGAACTATTATATGC 

Sequence ID 314 

CTTTTCCTCCCGCTGTCCCCCACGGAGGGGACTGCTCTCCCCCGCTGCATCCTTTC 
TGTGAGGTACCTTACCCACCTCAGCACCTGAGAGGGTGAAATAGAATTCTAACCTC 
GACATTCGGGAAGTGTTTTTGAGAAGTCTCGGTCGGTAAGGGAAGTCTTCCAAGTC 
CGTGCAGCACTAACGTATTGGCACCTGCCTCCTCTTCGGCCACCCCCCAGATGAGG 
CAGCTGTGACTGTGTCAAGGGAAGCCACGACTCTGACCATAGTCTTCTCTCAGCTT 
CCACTGCCGTCTCCACAGGAAACCCAGAAGTTCTGTGAACAAGTCCATGCTGCCAT 
CAAGGCATTTATTGCAGTGTACTATTTGCTTCCAAAGGATCAGGCCCTGAGAACAA 
TGACCTTATTTCCTACAACAGTGTCTGGGTTGCGTGCCAGCAGATGCCTCAGATAC 
CAAGAGATAACAAA.GCTGCAGCTCTTTTGATGCTGACCAAGAATGTGGATTTTGTG 
AAGGATGCACATGAAGAAATGGAGCAGGCTGTGGAAGAATGTGACCCTTACTCTGG 
CCTCTTGAATGATACTGAGGAGAACAACTCTGACAACCACAATCATGAGG 

Sequence ID 315 

TGGTACAGATACAAACTGGACTCTCAGGACAAAACGACACCAGCCAAACCAGCAGC 
CCCTCAGCATCCAGCAGCATGAGCGGAGGCATTTTCCTTTTCTTCGTGGCCAATGC 
CATAATCCACCTCTTCTGCTTCAGTTGAGGTGACACGTCTCAGCCTTAGCCCTGTG 
CCCCCTGAAACAGCTGCCACCATCACTCGCAAGAGAATCCCCTCCATCTTTGGGAG 
GGGTTGATGCCAGACATCACCAGGTTGTAGAAGTTGACAGGCAGTGCCATGGGGGC 
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AACAGCCAAAATAGGGGGGTAATGATGTACGGGCCAAGCACTGCCCAGCTGGGGGT 
CAATAAAGTTACCCTTGTACTTG 

Sequence ID 316 

CGCCACTTATCCAGTGAACCACTATCACGAAAAAAACTCTACCTCTCTATACTAAT 
CTCCCTACAAATCTCCTTAATTATAACATTCACAGCCACAGAACTAATCATATTAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 

Sequence ID 321 

CAGAACAGTACTTTTTAATTTGACCCATGAATTCTATTTAAATTTGTCACTTAATA 
TTTAGCCAAGAAGCAAACCATCTAAAAAGATTTCTGGTTTATTTCTCCAACTCCTA 
ATAAATAGGGTCACATATTTTTTAACTTTTTTCTAATTTGAAAAGTAATACAGGCA 
TATGGTATTTTAAAAATGAAACAACACAAAGGGATATGTTTTGAAAAGTGGTTCTT 
GCCATCCCTGAACTGTAATCATCCCTAACATATTCATACCTGTTTTCATTTTAAAA 
GTTGGGTCAGTTTTTTTATTAGTACATGTATTTCTATCCTACTGATTTATTTGCTA 
TATCATCTAATTTAGTTTGAATATTCCATAATTTACTTAATTAGTCCTGTATGGAG 
ACCTAGCTCTTCTCAGTGTCTACTATTATAAACAATGCTACAGTGAATATTGGTGT 
ATAAATCCATACACACCACGTAACATATCTTAAGTTCCTGGAAGAGATATTGCTAA 
ACCAGAAGATAACCTGCATTTAAAATTTTGACTGCTAGGGTCAGGGTCACATTTAA 
ATTAAATTAGAACAAGGAATGCATAATGTCTTCGATAGCAATCTATTCCAGGTGCA 
CCGTGGTCACAAAGGAAAGCAAAACTGTCAATAACTTTCTTCTCA 

Sequence ID 322 

TAGCATTTGGCCTTTTAAAACATTTGTTTATTTTTTTTCTGAGAATGGCTAACACA 
CTTTATTGAGGTTCGAAATTAATAAAGAAAATAAAAGAAATGTATCTTCATTCATT 
CTGTATGTTAGTGTTTTAATTACCCTTAGAATATATGGATAAAAAATACTATTCTT 
TGTCTTGGAGAAGGTAAGAGTCTAGTTAGATGAATAAGGGTTATCTATGTAGAACA 
ACTAGAGAATGAGAAGAGAGCTTATGAGATTGAGTACTACGTTATGCAGTAGAGTA 
GCACGTCATCTGCTACTGAGTATGGTGTGATAACATTGTGTAACAGGAAAGTATGA 
TCAATATCTACTTAAAATTAAGGACAATATTAGCACTACATTGCTTTATTTTAAAG 
TAAAAATTAGAGAACTAAACACAAGCATTGTAAGTACAATAAAAGCTGATCTTTCT 
AGTTAAGCAGAATAATACATGTTCAAGCATCTGCTAAATCATTAAATATAAGAATA 
TAGGGGTTTTCTATAATCTTATTTTCTTTGGAAGAGTACCTCATTTTCAAGANGAG 
AAGTTTCTAATTGCCACTTCTTTAAAAATAAAACAGGGTTTTAATGTTCCCAGCAC 
AAAAATTAATATCTCTTCAAAAAGTCTCTTGTGATTAAGTTTGAATCCCTTGTCAT 
ACTGCTTCTAATATTGACACTGACCTCCTTAGGTATTTTTCAGGGGTTATAATCTT 
TTCTTAAGGTATCTTTTTTCAAGAATTGGATACCTTGGGCTT 
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Sequence ID 323 

CGCGTCGACTTTTAAAGTCATCTCTATAGGAAGGTGCTGGGCAGGGATCCCAGAGA 
AAGAAAGGGTCCAAGACTCCATTAACTGCCCTGGATGAAGGGCACTGCTACAGCAG 
CTAGTACCAGAGACTCTCCTATCTCACGGTTGAGGCAGACCCAGGATAGAATAGAG 
AATAAAAGGAATGCTTATAGGAAACAATTTTGTATGGAATGCTAGATGGCCAAGCC 
TCAGCCTTTGGTCCAGTGCAACCCTTGCCTCGCTTGTCAACAGTGAAAAATTAGTT 
TGGTTAGAAGAACCATCTGGAAACACACCAGCTTCTGCTACCTTCATGCTCATTGT 
TAAAAAAAGATTAACCAGTGTGAACATTCTGATCTGTTAATTCCAGGGACTGTTTT 
CTTTCCAATGGACTGTTTGTTGGTAGAATAACCCCCAAAAGCTCAAAGCTAAAATG 
CATCATCAGTCCTAGTCGGCAGTTCCTTAAGAATGGACTGGCGGCGTGGTTGAGCT 
GATATGGAAAAGCTGCACCTTCCTGCAGAAGATCAACTGACCTGCTATCCCACCCC 
. AAATTCAACCTGAGGTATATTTCAGTGAAGCAGGTAGCTGTGCTTCTCAAAGCAGA 
GAAGCAGTTTTAAGAACCAAAAAGGTAGAGGAAATCTA 

Sequence ID 324 

GTTTGTTACAGGCAGAATTGGATAGATACAGCCCTACAAATGTATATGCCCTCCCC 

TGAAAAAAATTGGATGAAAATCTGCACAGCAAAGTGAAACACACAGATAATAGGAA 

CAAAATGTAGTTCCCATGTGCCAAACAAAATAAATGAAATCTCTGCATGTTTGCAG 

CATATCTGCCTTTTGGGAATGTAATCAAGGNATAATCTTTGGCTAGTGTTATGTGC 

CTGTATTTTTTTAAAATGGTACACCAGAAAAGGACTGGCAGTCTACTTCTACCATA 

GTTAAACTTCACCCTCTTTAATTTCACAACATATTCTTTGGAAGCAGGAAGAAATG 

CTCATAAAGAGGATCAGACCTTCTTTCCCGTGAAACCAGTATTTGGCGCCATATAT 

AAGCCTGGTTAAATTGGTCATCTAAAGCTGTCAAATAAGACATTCTGTGAAAGGTA 

AACATCGAAACTGGTTATAAGTAAAACCATCAAGCCAACAACAGGGTCTTGAGATA 

ACCTTTGAAGCTTATTGTCTGGCCTGCACCAGAAGATGTCTGCATTACTCATTGCT 

AAAAATGTGTACACAGAACTGCACTAGGATTAATTGGTTCAAGAAGAAATTTAAAC 

TTACGTTTGGGTTTCCATACAGCACTCTATTGAATACATGCATCTGAATTTAAGTT 
GCAA 

Sequence ID 3 25 

GACCAGTAATGGCTTTTAAGAGTCCATTTTGTCATTGTCTCCCTAGTTAATTACAG 
GTGGGGGATCTTTTGCCTCTATTCTCTTCATATTGAAATGAATCATACTCATGTTT 
TGTGGAACTCCTTAAAGTTGTAGCTGTCATGATCAGATTTTTTTTATATTTCCTCA 
GCTTAACTCTGCTACTTGATTTACAGTGACCCATAACCTACTCATCCTTGGTTTAT 
AGTGACACATAATCTTATCTCTTTATAGAACCTTAAATTTTATCATTATTTTCGCT 
TAGAATACAGCATTTCTTTGCTTCTGTTGCTGGTTTGACTTAAGAAATAAGGCAGT 
AACTCTGATCAATCAATTATCCATAAGGAAGGGCTTTTCATGGGTTCTATTAATTT 
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GTTAGTACCCTAAGTATATCTGAAAAATATGTCTATTGAGAGAAGATTTTGGCATT 
CCAGATGGTATAGTCTATATATATTTAAAGTTTTGAATTTGCTTATATATACTCAG 
CTTTCTTTTTCTAGCATTTTTGCATTTACCTGTTAATTGAAGTATACCCCCCACAT 
ATAAAAGTTCCTCTTAAAGACACTGGACTCTTTCTGGGGGGCTAAAATA 

5 

Sequence ID - 326 nt : 554 

cccggaatcgcggcccgcgtcgacaacaaacctgcatgttctgcacatgtatcca;g 
gaacttaaaaaaaaaaaaagatagtttgtgtgtcttaattgaataatagtagattt 

ATAGATTAAAGATCTATGGGTTTTTAATATGGATTANAAATCTGTGGGTTTTTGAT 
1 0 ATGGATTANAAATCTGTGGGTTTTTAATATGGATTGGAAATCTGTGGGTTTTTAAT 
ATGGATTAAAAAACATCTGTGGGTTTTTAATATGGATTAAACATCTGTGGGTTTTT 
AATATGGATTAAACATCTGGGTTTTTAATATGGATTAAACATCTGTGGGTTTTTAA 
TATGGGTTAAAAATCAAAAGAAAATGAACTATTTGCTCCAGTGCAGGAAAATACAG 
GCAATACTGGATACAATTAGATGGTCAGGAGCGATAACCCGGTTGCCATTGTTTGA 
1 5 AGAAGAGAATAAGGNGCTAGCATTCCTATCCGTAGATAATTTGACAGCTAGGAAAT 
AGGGGGAGTCTTCTATGTAGTTAGTGAAGGCTAAATGAACTATTATATGC 

Sequence ID 327 

CGGCTACCGACAGAAGGACTATTTCATCGCCACCCAGGGGCCACTGGCACACACGG 
2 0 TTGAGGACTTCTGGAGGATGATCTGGGAGGGGAAGTCCCACACTATCGTGATGCTG 
ACGGAGGTGCAGGAGAGAGAGCAGGATAAATGCTACCAGTATTGGCCAACCGAGGG 
CTCAGTTACTCATGGAGAAATAACGATTGAGATAAAGAATGATACCCTTTCAGAAG 
CCATCAGTATACGAGACTTTCTGGTCACTCTCAATCAGCCCCAGGCCCGCCA.GGAG 
GAGCAGGTCCGAGTAGTGCGCCAGTTTCACTTCCACGGCTGGCCTGAGATCGGGAT 

2 5 TCCCGCCGAGGGCAAAGGCATGATTGACCTCATCGCAGCCGTGCAGAAGCANCAGC 

AGCAGACAGGCAACCACCCCATCACCGTGCACTGCAGTGCCGGAGCTGGGCGAACA 
GGTACATTCATAGCCCTCAGCAACATTTTGGAGCGAGTAAAAGCCGAGGGACTTTT 
ANATGTATTTCAAGCTGTGAAGAGTTTACGACTTCAGAGACCACATATGGTGCAAC 
CCTGGAACAGTATGAAATGTGCTACAAAGTGGTACAAGATTTATTGATATATTTCT 

3 0 GATTATGCTAATTTCAATGAAGATCCTGCCTTAAATATTTTTTAATTTAATGGCAN 

AT 



Sequence ID 328 

CAAGACTCCATCTCAAAAAAAAAAAAAAATCTACAGTGCTGAGTATATAAAATTAT 
3 5 TAACACATTTCACAACAATATGTGTTTGTGGAGTTAAATATTTTTTGTCTTTAAAA 
CAGGTAATTTTAGTGCATACTTAATTTGATGATTAAATATGGTAGAATTAAGCATT 
TTAAATGTTAATGTTTGTTACATTGTTCAAGAAATAAGTAGAAATATATTCCTTTG 
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TTTTTTATTTAAATTTTTGTTCCTCTGTAAACTAAAAGAACACGAAGTAATTGGTC 
ACAATTACTGGTGTTTAACTGCCAAATATGGGTAAATAAGGGAAAATTTTGTTTAA 
TATTTAGTCCTTCTGAGATGGCTTGAATATTTGAATTTTGTTGTACGTCTATACTG 
GGTAGTCACAAGTCTTATAAACACTTTAGAGGAAAGATGGATTTCAGTCTGTATTT 
TTAAACATCATTTATTTTAAATCTGGTGCTGAAAAATAAGAAAAAAATTAAACTGC 
ATTCTGCTGTTCTTCTTTANAAGCATTCCTGCGTAAATACTGCTGTAATACTGTCA 
TGCAAAGTGTATCCTTTCTTGTCGTATCCTTTTTGGGGCAGTGGTTTTT 

Sequence ID 33 0 

GCGGGAATCGCGGCCCGCGTCGACCTCAAAGGAGAAAAAAAACCTTGTAAAAAAAG 
CAAAAATGACAACAGAAAAACAATCTTATTCCGAGCATTCCAGTAACTTTTTTGTG 
TATGTACTTAGCTGTACTATAAGTAGTTGGTTTGTATGAGATGGTTAAAAAGGCCA 
AAGATAAAAGGTTTCTTTTTTTTTCCTTTTTTGTCTATGAAGTTGCTGTTTATTTT 
TTTTGGCCTGTTTGATGTATGTGTGAAACAATGTTGTCCAACAATAAACAGGAATT 
TTATTTTGCTGAGTTGTTCTAAAAAAAAAAAA 

AAAAAAAAAAAAAATTTTAAAATTTTTAZ^AATAAAACCCTTGGTTAT 
Sequence ID 331 

GCCGCGTCGACCTGCATGAGCCACAGTTTCTTGACTGGAGGCCATCAACCCTCTTG 
GTTGAGGCCTTGTTCTGAGCCCTGACATGTGCTTGGGCACTGGTGGGCCTGGGCTT 
CTGAGGTGGCCTCCTGCCCTGATCAGGGACCCTCCCCGCTTTCCTGGGCCTCTCAG 
TTGAACAAAGCAGCAAAACAAAGGCAGTTTTATATGAAAGATTANAAGCCTGGAAT 
AATCAGGCTTTTTAAATGATGTAATTCCCACTGTAATAGCATAGGGATTTTGGAAG 
CAGCTGCTGGTGGCTTGGGACATCANTGGGGCCAAGGGTTCTCTGTCCCTGGTTCA 
ACTGTGATTTGGCTTTCCCGTGTCTTTCCTGGTGATGCCTTGTTTGGGGTTCTGTG 
GGTTTGGGTGGGAAGAGGGCCATCTGCCTGAATGTAACCTGCTAGCTCTCCGAAGC 
CCTGCGGGCCTGGCTTGTGTGAGCGTGTGGACAGTGGTGGCCGCGCTGTGCCTGCT 
CGTGTTGCCTACATGTCCCTGGCTTGTTGAGGCGCTGCTTCAACCTGCACCCCTCC 
TTGTCTCATAGATGCTCCTTTTGACCTTTTCAAAATTAATATGGATGGGAAAGCTC 
CTATGCCTTTTGGCTTCCTGGTAGAAGGCGGGATGCCCAAGGGTCTGCCTGGGTGT 

GGATTGGATGCTTGGGGTGTGGGGGTTGGAAACTGTCTTGTGGCCCACTTGGGCCC 
C 

Sequence ID 335 

CCCGCGTCGACTTTTAAAGTCATCTCTATAGGAAGGTGCTGGGCAGGGATCCCAGA 
GAAAGAAAGGGTCCAAGACTCCATTAACTGCCCTGGATGAAGGGCACTGCTACAGC 
AGCTAGTACCAGAGACTCTCCTATCTCACGGTTGAGGCAGACCCAGGATAGAATAG 
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AGAATAAAAGGAATGCTTATAGGAAACAATTTTGTATGGAATGCTAGATGGCCAAG 
CCTCAGCCTTTGGTCCAGTGCAACCCTTGCCTCGCTTGTGAACAGTGAAAAATTAG 
TTTGGTTAGAAGAACCATCTGGAAACACACCAGCTTCTGCTACCTTCATGCTCATT 
GTTAAAAAAAGATTAACCAGTGTGAACATTCTGATCTGTTAATTCCAGGGACTGTT 
5 TTCTTTCCAATGGACTGTTTGTTGGTAGAATAACCCCCAAAAGCTCAAAGCTAAAA 
TGCATCATCAGTCCTAGTCGGCAGTTCCTTAAGAATGGACTGGCGGCGTGGGTGAG 
CTGATTTGGAAAACTGCCCTTCTGCAAAAAACACTGGCCTGCTTTCCA 

Sequence ID 33 7 

1 0 CAAGACTCCATCTCAAAAAAAAAA?y^AATCTACAGTGCTGAGTATATAAAATTAT 
TAACACATTTCACAACAATATGTGTTTGTGGAGTTAAATATTTTTTGTCTTTAAAA 
CAGGTAATTTTAGTGCATACTTAATTTGATGATTAAATATGGTAGAATTAAGCATT 
TTAAATGTTAATGTTTGTTACATTGTTCAAGAAATAAGTAGAAATATATTCCTTTG 
TTTTTTATTTAAATTTTTGTTCCTCTGTAAACTAAAAGAACACGAAGTAATTGGTC 

1 5 ACAATTACTGGTGTTTAACTGCCAAATATGGGTAAATAAGGGAAAATTTTGTTTAA 
TATTTAGTCCTTCTGAGATGGCTTGAATATTTGAATTTTGTTGTACGTCTATACTG 
GGTAGTCACAAGTCTTATAAACACTTTAGAGGAAAGATGGATTTCAGTCTGTATTT 
TTAAACATCATTTATTTTAAATCTGGTGCTGAAAAATAAGAAAAAAATTAAACTGC 
ATTCTGCTGTTCTTCTTTAGAAGCATTCCTGCGTAAATACTGCTGTAATACTGTCA 

2 0 TGCAAAGTGTATCCTTTCTTGTCGTATCCTTTTTGGGGCAGTGGTT 

Sequence ID 3 38 

CTGGACTGCATGACCAGATCTGATGGGTGAGACTCAGGTGGCATGGAAGAGCCGAA 
AGAGGATACCATATGTGGGTGCCGGGGGGGATAGGTGAGAAGTACTAGAAGGCGGA 
25 ATGGAAGGACACTTCTGCTCAGCTCTGTGACACGGGCAGGGACCCTGCAGGGCTCA 
GGTCCTTTAACACAGCAGCTTCATTCTAACACCAGCAGCGTTGGAACACACGTACA 
AGTATGCAGACTAAGCTCTTGCTTGGCTGATACGGCTTTTTGGGTTTTTAGAGAAC 
ATGCATATATGTTCTCATTCATGGTACATGAACTCAGAAGCCTTACTGCCTATTTT 
TGTTAATACTTCTGGGC^lACATTACCACTTACAACTCACACCAGTTAGAT^ATCAT 

3 0 TTGTAAAATGTTATTTAATAAAGCCAAAGAACTAAATCATATTTATTTTCCAAGGN 

TTTCTAAGATCTCTGAAACTAATGAGGTTTTTTAAATCCCCATTAAGTACTCATCA 
CTGCTAGTAAAAGCAGTTGTCTTTACCTTTAATTCCAGTGAGTCCCCTTAAATTTA 
TTTTTTATTATCTTTGGCTACATTGCCTTAGACAAAATGTGGTCACCCTAATTTAA 
NGGATAAAATTCACATCCTCACAGATTTCTTATTAAGAGGGTCTAANCCTTGAATA 
3 5 ATCANCAGTGGAAATGGAAGTCTTCTTTACTGGNTTTNATCCTTTCCCTTTTTTAT 
CCCATG 
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Sequence ID 339 

TTTTTTTTTAAATAAAGCTGTCGGCACTCAAGGGTAATTTCATATCAGTGTGNTCT 

ACAAGCTGGGGGAAAATGAGTTCTAATTGTCANAGCTACCAAATCCTTCACCTTTA 

GCATAAAGGTTTAAAGATATCACAAAGATGCCAAGTGATTAATAATGTTTTAAACC 

ACCCCTTTTTCTGTCTGAAAAAACAACTAAAACAATATTACAACAGTATAGTTACA 

GAAGGGTTCTATTTTCATATGTTTTATGCACACTGTGCCTCAAAGGTACTATTTAA 

ATATATATACTTTTGAGGGGGTGGCTAATGCAGAAACACCCAAGACCTAAGGAAGA 

TACAACCCCATTTCTAGGTGTGAGGTCTAAATGCTTCACACACCCACTTGTGACCT 

TTTTTCATGAAGAATCATAACACTGTGCAGTGAGAAACAGTGGCAAAGCAATACTG 

AAAGCATTTTAAATTATTTACTAGGTTAAAAGGGTGAACTGATACTTTAAATACAT 
CAAATTTCATCAT 

Sequence ID 360 

GCAAGTGAGAGCCGGACGGGCACTGGGCGACTCTGTGCCTCGCTGAGGAAAAATAA 

CTAAACATGGGCAAAGGAGATCCTAAGAAGCCGAGAGGCAAAATGTCATCATATGC 

ATTTTTTGTGCAAACTTGTCGGGAGGAGCATAAGAAGAAGCACCCAGATGCTTCAG 

TCAACTTCTCAGAGTTTTCTAAGAAGTGCTCAGAGAGGTGGAAGACCATGTCTGCT 

AAAGAGAAAGGAAAATTTGAAGATATGGCAAAAGCGGACAAGGCCCGTTATGAAAG 

AGAAATGAAAACCTATATCCCTCCCAAAGGGGAGACAAAAAAGAAGTTCAAGGATC 

CCAATGCACCCAAGAGGCCTCCTTCGGCCTTCTTCCTCTTCTGCTCTGAGTATCGC 

CCAAAAATCAAAGGAGAACATCCTGGCCTGTCCATTGGTGATGTTGCGAAGAAACT 

GGGAGAGATGTGGAATAACACTGCTGCAGATGACAAGCAGCCTTATGAAAAGAAGG 
CTGCGAAGCTGAAGGAAAAATACGAAAAGGTA 

Sequence ID - 361 nt . 622 

CTGTNATNGAATCTGCTTGTNACTNAAATGCTAAACTCAATTCTGTAATTCAATAG 

GTGCACCTNTCTGAGAAACATANNAGACAATGAGGAAAAGGATTCANCATTCCGTG 

GAATTTGTACCATGATCAGTGTGAATCCCANTGGCGTAATCCAAGTAAGATGTTCA 

CAAAGATTTGTTTTTAATGTCTAATTAATAAAATTTTAAAGGAAGAAACATTCTAA 

TACTTTAATTATAAAAAGTTAACTATTTTCAAAGGTATCAAAATACAGTTAAACCT 

TTAAAATGTATATTTCTTAATATCTTGAAATTGTAATGCCTTTTTTTTTTCCTAAA 

TTTTTTTTGTCATGAAATGAGATAGTAACAGCAGATTGGGACAACAAGGTTATATT 

CTTGTCTTGAATCAGGCCATGGCTTCTTTCATCCAAATTTCAGACCTCATTTATTT 

ACTTTGTCCCTGCCTCCCATCCCTGGATATCANGTTTGTGGATATCTACAGTTAAT 

AGAGTGACCAAATAGTAGGAATACTGTCTCTCTATTCTGAATAAAATACTTTGAAT 

CAGATTTAGAAATAATGAATAAAATACAAATCACCATTGAAATTGCTCTAATTTTG 
AGAGCT 
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Sequence ID - 363 n t : 6 28 

ATCACNTGAGGCAAGAGTTTGAGCCAGCCTAGCTAACATGGTGAAACCCCATCTCT 

ACAAAAATATAAAAATTAGCCTGGGTGGTGATGGGCACCTGTAACCCCAGCTACTC 

GGGAGGCTGAGGTAGGAGAATCACTTGAACCCGGGAGATGGAGGTTGCAGTGAGCC 

AAGATCGTGCCACTGCACTCCAGCCTGTGTGACAGAACAAGACTCTGTCTCAAAAA 

AAAATAATAATAATAATAATAATAAAAAGGAATAACATAGCTAGGAATAAATTTAA 

TCAAAGAGGTGAAAGACTTATACACTTAAAACTACAAAAAAAAAATCACTGAAGGA 

ATTATAGACCCAAATAAAAATAAATAAAAAGACATTCTGTGTTTTAGGGAAAGAAG 

ACTTAATATTGTTAAGATGTCAATACTACCCAAAGTGATCTACAGATTCAACATAA 

TCCCTATCAAAATTCCAACAGCCTACTTTGTAGAAATGGAAAAGCCAATTTTCAAA 

TTCAGATGGAATTGCGAGGGGTTCTGAATAACAAAAACAATCTTGGGGAAAAAAAA 

CAAAAAACAAAGTCAAAGAACTCACACTTCTCTATTTATAAATTTACTACAAAGTT 
ATAGTAATCAAA 

Sequence ID - 364 n t : 528 

TGAACATCCAGCCATGTCATTTCTTCCATTCCTGCCCTGGAGTAAAGTAGATTTAC 

TGAGCTGATGACTTGTGTGCATTTGTACATTGCAACCTTAGCTTACCTCTTGAAGC 

ATGTAGAGCATTCATCACCCACCATTCATTCACTGCCTACTCCCACCACAGCTGTT 

TCGTGGTCTGTCTGCTCCCTGTGCCACCCCCACCCCATCAGGTGGGCCTTTTGCAA 

GTGATGAAGTCACCTGTGGGGGAAGAGCTTTCCTTTCCTCTCCTCAACTCAGAAGG 

CCTCTTCCTCTTGCTCAAGAGGGTGCTGCTGCTTTCTGCCTCCTTCCCCGGCCGGC 

CTCCATCCCAGTTCACCTTTTCAGAAATGGCCCCTCAGTCAACTCTTCCCTTTTCT 

CCTGGCTTTTTATTTCTCCCAGTCTCTTAAGAGTATCCTTAGCTTTAAAAACAATA 

ACACAGAGGATGGGTGCAGTGGCTCATGCCTGTAATCCCAGCACTTTGGAGCCTGG 
GGCGGGCGGATCACTTGAGGNCA 

Sequence ID 365 

GTCCCGGAATCGCGGCCGCGTCGACCTTTTCTATGCCTGCTATATAAACAGTACCT 

TGCAAGATGTCCTGTCTGATATCCACAAAGGGGTATTGTCAACCCCAAGTTCAGAC 

AGCTTTGTATTCTTCTGTCCCTGGATACATGAATTACTGCCATCTTTACACAGCGC 

CCTAAAATACCAACGCGAAGTTACCTGCTCAGCTTGAAGCTGCGCTGTACCCTGGA 

ACCAGCACTTCTGCTGAATGACTCAGGATGAAGCCTCGACTTCTCCTTCCCATCCC 

ATGCCCAGACCCCAGTGGCTCCTTTCCCAATCTGATCCAGTGACTTTAAGf CCAGC 

TGTTGCAACCTGGGCATGAGGAGGAGTGCAAGATGGCTTTGTCCTACCTGGAAAGA 
GGCTTTCTGGA 



Sequence ID 366 
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CACCATTTACACACAGTGGGTCCTTGAATAGCATCGTTTTATTCAATGTCATTTTG 
TTATAACATTGAGAAAAAAATTGATTCCCGGCTGGGGCCACTGTCTGTGCACCGT 

Sequence ID - 368 n t : 329 

GAAAGATCTAAAATCGACACCCTAACATCACAATTAAAAGAACTAGAGAAGCAAGA 
GCAAATTCAAAAGCTAGCAGAAGGCAAGAAATAACTAAGATCAGAGCAGAGCTGAA 
AGAGATAGAGACACAAAAAACCATTCAAAAAAAAACAATGAATCCAGGAGTTTTTT 
TTTTAAAAAGATCAACAGAATTGACAGACTGCTAGCAAGACTAATAAAGAAGAGAG 
AAGCATCAAATAGACTCAATAAAAAATGATAAAGGGGATATCACCACCAATCCCAC 
AGAAATACAAACTACCATCAGAGAACACTATAAACACCTCTATGCAAAT 

Sequence ID 3 69 

GAAAGATCTAAAATCGACACCCTAACATCACAATTAAAAGAACTAGAGAAGCAAGA 

GCAAATTCAAAAGCTAGCAGAAGGCAAGAAATAACTAAGATCAGAGCAGAGCTGAA 

AGAGATAGAGACACAAAAAACCATTCAAAAAAAAACAATGAATCCAGGAGTTTTTT 

TTTTAAAAAGATCAACAGAATTGACAGACTGCTAGCAAGACTAATAAAGAAGAGAG 

AAGCATCAAATAGACTCAATAAAAAATGATAAAGGGGATATCACCACCAATCCCAC 

AGAAATACAAACTACCATCAGAGAACACTATAAACACCTCTATGCAAATAAACTAG 
AAAAT 

Sequence ID 3 70 

GAAAGATCTAAAATCGACACCCTAACATCACAATTAAAAGAACTAGAGAAGCAAGA 

GCAAATTCAAAAGCTAGCAGAAGGCAAGAAATAACTAAGATCAGAGCAGAGCTGAA 

AGAGATAGAGACACAAAAAACCATTCAAAAAAAAACAATGAATCCAGGAGTTTTTT 
TTTTAAAAAGATCAACA 

Sequence ID 371 

GCCCGGAATCGCGGCCGCGTCGACGTAAGCTCGGCTGAATCCACGGTTCAAGAACA 
GGAAAGAAGGCCAAGGCATAGGGAGTGGGGCAGTTGGGTGAATATTAGTACCTTTC 
CCTCAGNTNCATTAATTACCCCTGCCTACTCTGCACAAAAGGATNTAACAACAGTT 
TCCTTTTTAATGGCCAGGTACAGCTGCTTATATGGANGGGCATTTNTNAATGATAT 
CCTTNATCACTGTCTTAATCATCACATNCTTAAAACAATCACTTTATTGTGTTAAG 
GAAGATAAAAATGGCTGGGTTCAATTTCCGTTCTGGAAGAAATCGANTNAAAAGGT 
AACCATTTAATAATGCANAGGGCANTTTCACTGCAGACCCTAATACTGGAAATTTT 
TAAAAACAAATGAAAAACTTCTACTTTTTCTTCTAAGCTTACTTAACCACCCAAAT 
TTTCCAGCCACATATCTTCCTAGTCTACAACTGCCTTTAACTTTAAGAGATGCTCA 
AAAAAATGTAAATTCTCAAATACATTCTTATTACAATTACTGCTAACCT ' 
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Sequence ID 373 

CCAGTGTGCTGGGATTACAGGCATGAGCCCTGCACCCAGCCTCTTAAACTGATCAT 

ATGATATTGGTTCTCAACCAAGGGTGACTTTGCCCCCAGAGGATACTTGGCAATGT 

CTGGAGATACTCAGTTGTCATGACTTGGACAGGTGCTACTGTCACCCAGTGGGTAG 

AGGTCAGGGATGGTGCTAAACATAGGACAGCTGTCAAGAGAAAAGAATGTACCCAG 

CCCCAAATGTCAGTAGGGCTGAGGTTGAGAAACCCAGCTGTAGCTGACGTGTGAAG 

GACAGACTGGCCTGGAAGTGTGTTTTCTGCCCCTTTCCACCCCTGCATATTAGTTA 

AGGCCAAAGGAAAAAAGGAATGCAGGAAATGCCCGTTAAAAATCTTCAAAACAATA 

TAAAATGATCAATTCCACTAAAACCCTTTACACATTTAAGTATAAAGGTATTGGTA 

GGAAAATTTGTTATTCACTGCTTTTCTCAGTGTCATGAAATAATTATTTCTGCTGT 
CAGTTT 

Sequence ID 3 74 

AAAAAAAAAATCACTGAAGGAATTATAGACCCAAATAAAAATAAATAAAAAGACAT 
TCTGTGTTTTAGGGAAAGAAGACTTAATATTGTTAAGATGTCAATACTACCCAAAG 
TGATCTACAGATTCAACATAATCCCTATCAAAATTCCAACAGCCTACTTTGTAGAA 
ATGGAAAAGCCAATTTTCAAATTCAGATGGAATTGCGAGGGGTTCTGAATAACAAA 
AACAATCTTGGGGAAAAAAAACAAAAAACAAAGTCAAAGAACTCACACTTCTCTAT 
TTATAATTTACTACAAAGTTATAGTAATCAAAGTCGACGCGGCCGCGATTCCGGG 

Sequence ID 378 

CGACTGCGGCTCTTCCTCGGGCAGCGGAAGCGGCGCGGCGGTCGGAGAAGTGGCCT 
AAAACTTCGGCGTTGGGTGAAAGAAAATGGCCCGAACCAAGCAGACTGCTCGTAAG 
TCCACCGGTGGGAAAGCCCCCCGCAAACAGCTGGCCACGAAAGCCGCCAGGAAAAG 
CGCTCCCTCTACCGGCGGGGTGAAGAAGCCTCATCGCTACAGGCCCGGGACCGTGG 
CGCTTCGAGAGATTCGTCGTTATCAGAAGTCGACCGAGCTGCTCATCCGGAAGCTG 
CCCTTCCAGAGGTTGGTGAGGGANATCGCCCAGG 



Sequence ID 3 80 

GCAATTTAATTTTTAATAACAAAGATACTGTATTTTAACATGGTGAAATATACTTG 
GCTAAGTCCAGATTAAAAAAAAAAAGTATCTAGCCCAACAGTACAATTATACAGCT 
TTGTACAGAACATTCCATAGATCAACAGAAAATACATTTGAGCGCAAAAATAAAAA 
ATATTTAAGGAGAATCTCTAAGCAGCATTTTATTTCTGCAAAAGACATATCTTGTC 
TGATTAAATATCTACAAGTGCTTTTCCTTTCAAAAATACATATATTCTTAATAGAC 
TAAGTCATTAACAATGACCTGGTAATTCTTTCACTTCAATTTGAATGATTTATAAG 
CTAAATCTTCAACCACAAAAAGGTTTTTATTTGTATTAAGATGTTACCACTTTTGA 
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C AAAAAGCTTAAAATATTTTATATTT CAAAGGAAAATTAGCAAC^TAACTTTACAA 
TATATTCTATGATATTTTGATTGTGAGGGCTACTCTATTTAAAACTGATGATCTCT 
GTTGTGTTGCTCAGATGCAGGAAAGCAGCAAAA 

5 Sequence ID - 381 nt : 534 

GACTTANATCTAAATGGACCACATTCTCTACTTAAAAAAATGCTATTAACCATGTG 
ATCTTCTCAGTCATGAGGTAATCTGGTGACTACCCTTCCTCAAAGCCAGTTGGGAT 
ATTCTTTGAATAGAGTAAAACAGTGTTTCTAGGCTGGGAGACACCAGACATAGTTG 
AGGACAGAGGTGCTAGAAAATAGGAAGTTTAAAAGCATGTGCGGTGATGCTCAGAG 
1 0 GAGGTAAACCCCACCCTCATGCTCATAGCTTCCAATCATTTTCTCTAGTTCTTAAC 
TCTTAAATGTGAGAAATGCTTGAAGATTCTAGTCATCTGAAGAAAGTCTCTTTATT 
AAAGATTTTCATAAAAGAGACCAAAGCAGACAAACAGAAAAAGACATCTTGGGGAA 
AAAAACAAGGATAATGGGAAGAGAAGGAAAGTTTTAAAAATTATCAATATCCTCAG 

1 5 AGCAAAACAAGCTCCTAAAAATAAAGTTTG 

Sequence ID - 382 nt : 444 

GTTAAGGAAGTCAGCACTTACATTAAGAAAATTGGCTACAACCCCGACACAGTAGC 
ATTTGTGCCAATTTCTGGTTGGAATGGTGACAACATGCTGGAGCCAAGTGCTAACA 
2 0 TGCCTTGGTTCAAGGGATGGAAAGTCACCCGTAAGGATGGCAATGCCAGTGGAACC 
ACGCTGCTTGAGGCTCTGGACTGCATCCTACCACCAACTCGTCCAACTGACAAGCC 
CTTGCGCCTGCCTCTCCAGGATGTCTACAAAATTGGTGGTATTGGTACTGTTCCTG 
TTGGCCGAGTGGAGACTGGTGTTCTCAAACCCGGTATGGTGGTCACCTTTGCTCCA 
GTCAACGTTACAACGGAAGTAAAATCTGTCGAAATGCACCATGAAGCTTTGAGTGA 

2 5 AGCTTTTCCTGGGGACAATGTGGGCTTCAATGTCAAGAATGTGTCTGTCAAG 

Sequence ID - 383 nt : 566 

CTTTGAAGAACTTTGCCAAATACTTTCTTACCAATCTCATGAGGAGAGGGAACATG 

CTGAGAAACTGATGAAGCTGCAGAACCAACGAGGTGGCCGAATCTTCCTTCAGGAT 

3 0 ATCAAGAAACCAGACTGTGATGACTGGGAGAGCGGGCTGAATGCAATGGAGTGTGC 

ATTACATTTGGAAAAAAATGTGAATCAGTCACTACTGGAACTGCACAAACTGGCCA 
CTGACAAAAATGACCCCCATTTGTGTGACTTCATTGAGACACATTACCTGAATGAG 
CAGGTGAAAGCCATCAAAGAATTGGGTGACCACGTGACCAACTTGCGCAAGATGGG 
AGCGCCCGAATCTGGCTTGGCGGAATATCTCTTTGACAAGCACACCCTGGGAGACA 
3 5 GTGATAATGAAAGCTAAGCCTCGGGCTAATTTCCCCATAGCCGTGGGGTGACTTCC 
CTGGTCACCAAGGCAGTGCATGCATGTTGGGGTTTCCTTTACCTTTTCTATAAGTT 
GTACCAAAACATCCACTTAAGTTCTTTGATTTGTCCATTCCTTCAAATAAAGAAAT 
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TTGGTA 

Sequence ID 384 

TTTTGGGGTTTATATATAAGCCTGGTTCTTGCTGAAACTGCTTATGTTGATAACCA 

GTTAGTGAGTTCCTCTCTATTGACTTGCTGGGAAGTTTATAGAGACATTTTTTATG 

CATTCAGAGATTTCAGTACAAATCTTGAAAAAGGGACATTTAGGCCGGGCGCGGTG 

GCTCACATCTGTAACCCTAGCACTCTGGGAGGCTGAGGTGGGTGGATCATGAAGTC 

AAGAGATAGAGACCATCCTGGCAAAAATTAGCTGGGCGTGGTGGGGTGCGCCCGTA 

GTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATTGCTTGAGCCCGGGAGGCGGAG 

GTTTCATTGAGCCGAGATAGTGCCACTGCACTCCAGCCTGGACAACAGAGCGAGAC 
TGTGTCTT 

Sequence ID 386 

CTAAGGGTTTAAAGATGGAAAGAGGCATTGATGAACAGCTGGGGAAGGAGTAGTTT 

GAGGTAGATGTGCAGATGGAATGAAGAGAAGGTCTCAAGAAGAGGGTGGAGCCAAA 

GAGGGCTGCAGATTTAGAAGGCTAAAGTCTTTAGATGGCTTTGGATAGCCTGTTGT 

ATCTTGGACCATGCAGGTTACAGTGGAGCATGGAGTGGGGACAGAAGTGGAGGAAG 

GAACCAGGGAACATGGAGTGAGAAGCTAAAGGAAAGTGATGCAGTAGATACATGGC 

TCTAAAGTACTCAGGACTTTCAGAGGCTTAAACATAGGGTGACCAACTATCCCACT 

ATGCCTGATACTAAGGGCATTCCCTGGATGTGGACCTTTCATTCCCCAAATTAGGA 

AAGTCTTGGGCATACCAAGACAAGTTGGCCACCCTACTCAAAAGTATGTAAGCTAA 

CATATCTGTTCTCTAAGAGGTTAAAGCTGGATGGGGATACCAGATGTATGTACGTG 

ATGCAGTTAAACAGCAATACAAGGGGGCAAGTCTACCTGATCGGCCAATTCAATGG 
GA 

Sequence ID 3 87 

GAAGCCAAACCAAAGGAGCTTCTACTTCATGATGCCATTTATGTAAAGTTCAGGCA 
GAGAAAATCAGTGGTTTAAGAAGTTAGAATAATGATTATCTTTGGAGGGATTGCAA 
CTGGAAGAAGTCATGATTGGGATTTCTGGGTCCTAATAGTGCTCTGTGTCTTGATC 
TGAGTGCCGACTACATGAGTGGTTAGGTTTGCAAAATTGATTGAGTTATGCACTTA 
ATGGTGTTGTCTTATTAGAGCTGATGGAGGAGAGAGGGCTTCAATTTGCACAACTG 
AGTAATCAGCTAGGCCCAGTCACTAGGTGAACAACTTACTGCTCCAATCAGCCTTA 
GAGCAGGAATCAAACTCATGTCTCAGAAAAGTTATTAATTCAGCTTGTCTTGGGAC 
TTCCTTCAGAGTCACTCTTGAATAGCTGAAATAGTAAATGTTAAATCTGTGGATGC 
AAGTGTGTAAATTATTTTAGTCATCAGCTCTAATAAGATGGCCTTTGGGGAAATGA 
GTATAAGGTCACGAAAATGAAATGGCAAGAAGGAGGTCTACTATTTCTTCTGTAAT 
ACTGATTTTTACCCCATCAGGGTCAGTCCCCAGAGGTTGTAAATGTGAAi3CTTG-T 
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CTTTTTCTTTAATAA 
Sequence ID 388 

CTTTGGACA.CTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTAACACCCATAGT 

AGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCACTACCTA 

AAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATCACC 

CTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAG 

CCTGCGTCAGATTAAAACACTGAACTGACAATTAACAGCCCAATATCTACAATCAA 

CCAACAAGTCATTATTACCCTCACTGTCAACCCAACACAGGCATGCTCATAAGGAA 

AGGTTAASlAAAAGTAAAAGGAACTCGGCAAATCTTACCCCGCCTGTTTACCAAAAA 

CATCACCTCTAGCATCACCAGTATTAGAGGCACCGCCTGCCCAGTGACACATGTTT 

AACGGCCGCGGTACCCTAACCGTGCAAAGGTAGCATAATCACTTGTTCCTTAATTA 

GGGACCTGTATGAATGGCTCCACGAGGGTTCAGCTGTCTCTTACTTTTAACCAGTG 

AAATTGACCTGCCCGTGAAGAGGCGGGCATAACACAGCAAGACGAGAAGACCCTAT 

GGAGCTTTAATTTATTAATGCAAACAGTCCTAACAAACCCCAGGTCCTAAACTCCA 
AACCTGCATTAAA 

Sequence ID 389 

CGACCCGGAATTCGCGGCCGCGTCGACTGAGTTCTTGACAAGAGTGTTTTTCCCTT 
CCCGTCACAGAGTGGGCCCAACGACCTACGGCACTTTGACCCCGAGTTTACCGAAG 
AGCCTGTCCCGAACTCCATTGGCAAGTCCCCTGACAGCGTCCTCGTCACAGCCAGC 
GTCAAGGAAGCTGCCGAGGCTTTCCTAGGCTTTTCCTATGCGCCTCCCACGGACTC 
TTTCCTCTGAACCCTGTTAGGGCTTGGTTTTAAAGGATTTTATGTGTGTTTCCGAA 
TGTTTTAGTTAGCCTTTTGGTGGAGCCGCCAGCTGACAGGACATCTTACAAGAGAA 
TTTGCACATCTCTGGAAGCTTAGCAATCTTATTGCACACTGTTCGCTGGAAGCTTT 
TTGAAGAGCACATTCTCCTCAGTGAGCTCATGAGGTTTTCATTTTTATTCTTCCTT 
CCAACGTGGTGCTATCTCTGAAACGAGCGTTAGAGTGCCGCCTTAGACGGAGGCAG 
GAGTTTCGTTAGAAAGCGGACGCTGTTCT 

Sequence ID - 3 90 nt : 523 

GAATCCCTAGAAAAAGAGAATTCCCAACTTGATGAGGAAAACTTAGAACTGCGAAG 

GAATGTAGAATCTTTGAAGTGTGCAAGCATGAAAATGGCTCAGCTACAGCTAGAAA 

ACAAAGAACTGGAAAGTGAAAAAGAGCAACTTAAGAAGGGTTTGGAGCTCCTGAAA 

GCATCTTTCAAGAAAACAGAACGCTTAGAAGTTAGCTACCAGGGTTTAGATATAGA 

AAATCAAAGACTGCAAAAAACTTTAGAGAACAGCAATAAAAAAATCCAGCAATTAG 

AGAGTGAACTACAAGACTTAGAGATGGAAAATCAAACATTGCAGAAAAACCTAGAA 

GAACTAAAAATATCTAGCAAAAGACTAGAACAGCTGGAAAAAGAAAATAAATCATT 
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AGAGCAAGAGACTTCTCAACTGGAAAAGGATAAGAAACAATTGGAGAAGGAAAATA 

AGAGACTCCGACANCAAGCAGAAATTAAAGATCCACATTTGAAGAAAATAATGTGA 
AGATTGGAAATTTGGAAAA 

Sequence ID - 391 nt: 566 

CTTTGAAGAACTTTGCCAAATACTTTCTTACCAATCTCATGAGGAGAGGGAACATG 

CTGAGAAACTGATGAAGCTGCAGAACCAACGAGGTGGCCGAATCTTCCTTCAGGAT 

ATCAAGAAACCAGACTGTGATGACTGGGAGAGCGGGCTGAATGCAATGGAGTGTGC 

ATTACATTTGGAAAAAAATGTGAATCAGTCACTACTGGAACTGCACAAACTGGCCA 

CTGACAAAAATGACCCCCATTTGTGTGACTTCATTGAGACACATTACCTGAATGAG 

CAGGTGAAAGCCATCAAAGAATTGGGTGACCACGTGACCAACTTGCGCAAGATGGG 

AGCGCCCGAATCTGGCTTGGCGGAATATCTCTTTGACAAGCACACCCTGGGAGACA 

GTGATAATGAAAGCTAAGCCTCGGGCTAATTTCCCCATAGCCGTGGGGTGACTTCC 

CTGGTCACCAAGGCAGTGCATGCATGTTGGGGTTTCCTTTACCTTTTCTATAAGTT 

GTACCAAAACATCCACTTAAGTTCTTTGATTTGTCCATTCCTTCAAATAAAGAAAT 
TTGGTA 

Sequence ID 3 94 

GACCCGGAATCGCGGCCGCGTCGACCATTTTAGCCAAGGTGCCTCTATAGGGGTCA 

AGACATCATGTGCCCAGACGTAAGGTCAGGAATGTCATATTTTTCTGTTAAAATCA 

TTTTATTTCTGTGTATCTTACCTTTAAATCATTGTGGTTTACTCTGAGATTCTGTA 

GTCCTAATATTGTATCATTGTGCTGTCTGCAAAACAACTTGAATCTATTTTGTTTG 

CATCTTTTGTTACATGTAACGCAGCTGTACTTTATGTTCTTTGCAACTGTTTCCAT 

TATGAGAACGCTGTGCTATTTACAAGGTTACATTTTTCTTGGCCAGGCGAGGTGGT 

CATGCCTGTGATCCCAGCACTTTGGGAGGCCAAGGTGGGCGGATCACTTGAGGTAA 

AGAGTTGAGACCAGCCTGGCTAGCATGGCGAAGCCCAGTCTCTACTAAAAATACAA 

AAATTGGCCGGGTGAAATTAGCCGGGCGTGGTGGTGTGTGCTTGTAATCCCAGCTA 

CTCGGGAGGCTGAGGCAGGAGAATCGCTTGAATCCGGGAGGCAGAGGTTGCAGTGA 

GCCAAGATCANGCCACTGCACTCCACCTCGGGGTCAAGAGCGAAACTCTGTCTCAA 

Sequence ID 3 95 

CCGTTTTAGTCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCTGCCTCGGCC 
TCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCGTAAATCAGGTTT 
TTTAAATGTTTGCCAAACCTTATCACTGACTTTTATAACAAAATTATTTACTATAA 
TCATTAGGGAATATTTAAGTTCTGCTAATACTTAAAATTGCAGAGTGCTAAAACCA 
GCAGTGAGTTTAGAATCAAGCTAAGCTTTATTGTTGCTACTATTTGAGGCATATTA 
GTTGACTGGTGTTCATATGCAAGGCAGTCTACTGGGTGCAACAAGGGTTAGAAGGA 
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TATTTTTAAAAAACTGACCCTATTCTCAGGATGAAAATAATACACTAGTAATAGTC 
TGCTCTGTTGGTTAACTCCTCGTAAGGAGGTCAATTAAAATGCTGTAGTGTTGCAA 
GGGAAGGAGAGGAAGAATCATATTCCTTCACTAGCAGGATCAAGAAAGCTTTTATA 
GAAATATACAAAATCTTCACTTCTTGAAGGATTGGTAAAATTTAATAGCCAACATT 
GGGCACTTATTCATTCTCTGAGTAAATATTTATTGCAT 

Sequence ID 3 96 

CTTAAATCTAAATGGACCACATTCTCTACTTAAAAAAATGCTATTAACCATGTGAT 
CTTCTCAGTCATGAGGTAATCTGGTGACTACCCTTCCTCAAAGCCAGTTGGGATAT 
TCTTTGAATAGAGTAAAACAGTGTTTCTAGGCTGGGAGACACCAGACATAGTTGAG 
GACAGAGGTGCTAGAAAATAGGAAGTTTAAAAGCATGTGCGGTGATGCTCAGAGGA 
GGTAAACCCCACCCTCATGCTCATAGCTTCCAATCATTTTCTCTAGTTCTTAACTC 
TTAAATGTGAGAAATGCTTGAAGATTACTAGTCATCTGAAGAAAGTCTCTTTATTA 
AAGATTTTCATAAAAGAGACCAAAGCAGACAAACAGAAAAAGACATCTTGGGGAAA 
AAAACAAGGATAATGGGAAGAGAAGGAAAGTTTTAAAAATTATCAATATCCTCAGG 
GGGACAAAATATTATATCCTATAAAGACAGATTTTTATTTTTTAAAAAAATAGAAA 
GCAAAACAAGCTCCTAAAAA 

Sequence ID - 397 nt: 534 

GACCCGGAATCGCGGCCGCGTCGACGGAAGCTCCTGCCCCTCCTAAAGCTGAAGCC 

AAAGCGAAGGCTTTAAAGGCCAAGAAGGCAGTGTTGAAAGGTGTCCACAGCCACAA 

AAAGAAGGAGATCCGCACGTCACCCACCTTCCGGCGGCCGAAGACACTGCGACTCC 

GGAGACAGCCCAAATATCCTCGGAAGAGCGCTCCCAGGAGAAACAAGCTTGACCAC 

TATGCTATCATCAAGTTTCCGCTGACCACTGAGTCTGCCATGAAGAAGATAGAAGA 

CAACAACACACTTGTGTTCATTGTGGATGTTAAAGCGAACAAGCACCAGATTAAAC 

AGGCTGTGAAGAAGCTGTATGACATTGATGTGGCCAAGGTCAACACCCTGATTCGG 

CCTGATGGAGAGAAGAAGGCATATGTTCGACTGGCTCGTGATTACGATGCTTTGGA 

TGTTGCCAACAAAATTGGGATCATTTAAACTGAGTCCAGCTGCCTAATTCTGAATA 

TATATATATATATATATCTTTTCACCATAA 

Sequence ID - 3 98 nt: 512 

GGGGAGCCCCCTCTTCCCTCAGTTGTTCCTACTCAGACTGTTGCACTCTAAACCTA 

GGGAGGTTGAAGAATGAGACCCTTAGGTTTTAACACGAATCCTGACACCACCATCT 

ATAGGGTCCCAACTTGGTTATTGTAGGCAACCTTCCCTCTCTCCTTGGTGAAGAAC 

ATCCCAAGCCAGAAAGAAGTTAACTACAGTGTTTTCCTTTGCACCGATCCCCACCC 

CAATTCAATCCCGGAAGGGACTTACTTAGGAAACCCTTCTTTACTAGATATCCTGG 

CCCCCTGGGCTTGTGAACACCTCCTAGCCACATCACTACAGTACAGTGAGTGACCC 
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CAGCCTCCTGCCTACCCCAAGATGCCCCTCCCCACCCTGACCGTGCTAACTGTGTG 

TACATATATATTCTACATATATGTATATTAAAACTGCACTGCCATGTCTGCCCTTT 

TTTGTGGTGTCTAGCATTAACTTATTGTCTAGGCCAAAGCGGGGGTGGGAGGGGAA 
TGCCACAG 

Sequence ID 3 99 

TTTTGGCATTACTTAATCCAATTATAAAAACTGAATTTTTAAAAAACAGCACTTGT 

TTTTTCTTCCAAGATTAATTTGAATTTTTTTATGGACATTAGAAAACATTGCAGTT 

TAGTCATAATCAAAAATAAATCTTGAGGCTGGTAGAGCAGCTTTGTTGCTGTTTAT 

ATTTTTATTGCTTACTGGATTTCAGTGTTACCTAGTGCCATCAGTTTGGTATTTTG 

CCACCTTGCACATTCAGTGATGTTTGATTTTTCTTTTTCCTTTTTTTCATATTACT 

TTTAAATCCTGAATAGTTTGTGGCAGCTGGAGATCACCTAGTCCACCACTGTCCAA 

CATGGCAATGGTAAGTAATATTGAGTAAAGAATAGAAAATTAGTAAAATGCATGGC 

TTCAGAATTATAGCAATTTGCAAAATAGGTTAATGGATGAAAATTAGAATGACCAG 

TTTAACTTTCCCCCCAGCAGATTCTTCTGTTAAACAATGCCCCTTCAAAATAAAGG 

AAGAACAAGTGGGTGTTATACCTATGTTATTTGGCTATGTTAGCACAATATGATGG 

ACTAATTTGAGAAAAAGCATTTACTTCCTTTACTATTACTTCTTTTCTTTATAGGG 

CTAAGTCTGCCTTCTGGGTCTTTGAA 

Sequence ID 4 00 

GAAGAAGCGCGAAGAGCCGTTAGTCATGCCGGTGTGGTGGCGGCGGCGGAGACTGC 
GGGCCCGTAGCTGGGCTCTGCGAGGTGCAAGAAAGCCTTTGAGGTGAAGGTGTATG 
AAAGTCATCATAACAGATGTTTTCCAAAAACTTGTAGAAGGTTGTGAAAAAACTAC 
TAGGATCACGCGGCATGTATTGAGCATATAGGTTGCTGTAGATGAATGTTCTTAGC 
TGTCATGTTTAAAAATACTTCTGCTTCGTTACCTCAAGTGTGGCATGCAGCATTTT 
GGAAGGAAAATTGAAGACGTGTTCAAGAAAACATGAACAGAAGCAAATGATGAAAA 
TGAGCATTTTACTTGATGTTGATAACATCACAATAAATTATGGAGAAAAATACATA 
TTTGGCTAACTTTTAATTGCTGAACAATAAAGTGTTTTCTTTTAAATCNAAAAA 

Sequence ID 401 

GAAGCCAAACCAAAGGGAGCTTCTACTTCATGATGCCATTTATGTAAAGTTCAGGC 
AGAGAAAATCAGTGGTTTAAGAAGTTAGAATAATGATTATCTTTGGAGGGATTGCA 
ACTGGAAGAAGTCATGATTGGGATTTCTGGGTCCTAATAGTGCTCTGTGTCTTGAT 
CTGAGTGCCGACTACATGAGTGGTTAGGTTTGCAAAATTCATTGAGTTATGCACTT 
AATGGTGTTGTCTTATTAGAGCTGATGGAGGAGAGAGGGCTTCAATTTGCACAACT 
GAGTAATCAGCTAGGCCCAGTCACTAGGTGAACAACTTACTGCTACCAATCAGCCT 
TAGAGCAGGAATCAAACTCATGTCTCAGAAAAGTtATTAATTCAGCTTGTCTTGGG 
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ACTTCCTTCAGAGTCACTCTTGAATAGCTGAAATAGTAAATGTTAAATCTGTGGAT 

GCAAGTGTGTAAATTATTTTAGTCATCAGCTCTAATAAGATGGCCTTTGGGGAAAT 

GAGTATAAGGTCACGAAAATGAAATGGCAAGAAGGAGGTCTACTATTTCTTCTGTA 

ATACTGATTTTTACCCCATCAGGGTCAGTCCCCAAAGGTTGTAAATGTGAAGCTTG 
GTCTTTTTCTTTA 

Sequence ID 402 

GACCCTATTCTCAGGATGAAAATAATACACTAGTAATAGTCTGCTCTGTTGGTTAA 

CTCCTCGTAAGGAGGTACAATTAAAATGCTGTAGTGTTGCAAGGGAAGGAGAGGAA 

GAATCATATTCCTTCACTAGCAGGATCAAGAAAGCTTTTATAGAAATATACAAAAT 

CTTCACTTCTTGAAGGATTGGTAAAATTTAATAGCCAACATTGGGCACTTATTCAT 

TCTCTGAGTAAATATTTATTGCATGCTTATCTTGTATCAACATTGNGATGAAAGCN 

CAAGAATGAAAGAGGAGGGAGAATGTTTANAGAATAAGGCTGAAACACAGATTTTG 
TAGGGAGCGTAGGGGAGACTGANAAAACAG 

Sequence ID 403 

AAGACACCTGATAGATTGTCTTGTATTATTTTTCCTTTGCCTTCTTACAATCTCAG 

TGATTAGAATTGGGCTGAAAACAATACATCAAATTCTCAGCAAAATCCTTATGGGT 

TGCTGGATACCGAGGGTTTTTAAGATCTTTAGACTTCACTATATAGAACAAATGTT 
GAATGGGAATTTTCTTTATTTCTATANCGTTTNG 

Sequence ID 405 

CCCGGAATCGCGGCCGCGTCGACGATGAGCATTTTTTCATGTGTCTTTTGGCTGCA 

TAAATGTCTTCTTTTGAGAAGTGTCGGTTCATATCCTTTGCCCACTTTTTGATGGG 

GTTGTTTTTTTCTTGTAAATTTGTTTGAGTTCATTGTAGATTCTGGATATTAGCCC 

TTTGTCAGATGAGTAGGTTGCGAAAATTTTCTCCCATTTTGTAGGTTGCCTGTTCA 

CTCTGATGGTAGTTTCATTTGCTGTGCAGAAGCTCTTTAGTTTAATTAGATCCCAT 

TTGTCAATTTTGGCTTTTGTTGCCATTGCTTTTGGTGTTTTAGACTTGAAGTCCTT 

GCCCATGCCTATGTCCTGAATGGTAATGCCTAGGTTTTCTTCTAGGGTTTTGATGG 

TTTTAGGTCTAACGTTTCAGTCTTTAATCCATCTTTTAAAAGTCTCTTCACAGTAC 

ATGAGTAGTAGTGACACCAATAATGTCAGAGCAGGGAACTCCCAGGTTCTGCCCAT 

CCACAAAAACAACAAATAAGCTGGCAAAAACTTTAAGAATCAACTTTTGCAGATCT 

CTGAAATCTAGTCAAAACTTAAACAGAGGAAAGATTAATAAAGACNGGCTGCCTGA 
GATAACACTAACACACAC 

Sequence ID 4 06 

CATCAAATAAATAAATAAATAAATTTTAAAAGTCACAGCATTGAATTTTTAAATGT 
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TTGGGATGATAAAGCACCTGCTTATCATGAAGCTANAGAAATTCAATGACACGTTT 
GCCAGGGTCTTTGCTAGTGATGTTGGAACAAGTCTGTAATGCTGATGAAACATCAC 
TGTTCGGGCATTATTGCCCCAGAAAGACACTGACTGCAGCTGATGAAACAGCCCTT 
CCAAGAATTAAGGATGCCAAAGACCAAATAACTGTGCTGAGATATACTTACGCAGC 
5 AGGCATGCATAAGTGTAAACTTGCTGTTATAAGCAAAAGCTTGCGTTCTCACTGTT 
TTCAAGGAGTGAATTTCATACCAATCCATTATTATGCTAATAAAAAGGCATGGATC 
ACCAGGGACATCTTTTCAGATTGGTTTCACAAACATTTTGTACCAGCAGCTTGTGC 
TTACTGCAGGGAAGCTGACTGGATGATGACTGCAAGATTTTGTTATATCTTAACAA 
CTGTTGTGCTCATCCTCCAGCTGAAATTCTCATCAAAAATAATGTTTATGGCTCAC 
10 ACCTGTAATCTCAACACTTTGGGAGGATTGCCTGACCCAGGAGTTCAAGCCCACCC 
TGGGCAACACAGCAAGACCCAACCTNTC 

Sequence ID 407 

TTTTAAAAATCATAAAACGTTTCTTACAAAAGAGCATTACATTNTGCACACTGCTC 
15 TGAACAGATGCCAGGGACATGTGGACTATTGTTACTTTTCCTCCCTGTCCCACCCC 
CCAAATGTTACAGTGACCACAAAGCAAGGTGTTCACAATAATTACATGGGGGGAAT 
TTTTTAAACCACCAACAATAACGAAAAATAAAATCCACTCACTCTGCTGCTGTTTC 
AAAATTTCAATGTTAGTTTTTGCACGCCCTTCCCCCCCCCAACCCTGTTTGTAAGG 
AACTAAAACATTACATCTGGTGAACAGCAAAGATTTCACTACACCTCAAATGCAGA 
2 0 ACACCTATGAAGCAGAGGAATGTTGGCTTTTTAAACAGAAGCAGATAAAAAAAAAA 
GATGCAGGACTCCTTCAGTTCTTCACTAGTCTTAGAAAAACTTTCCAGAATACTGC 
TTCACACTATAAAAAAGAAAAAATATCTTGCATTAGAATCCTTCAACATCTGCATA 
CTGCTTCACACTGTTCGTTTCTAGGAGCACTTTGTCACAGGACACTTCTGCTTATA 
TTTCTTTAATCAGAACTTAGTTGGATGGGCCGGGCATGGTGGCTCACGCCTGTAAT 

2 5 CCCAGCACTTTGGGAGGCCGAGG- GGGTGGATCACC 

Sequence ID 40 8 

CCATCTCCAAATTTAGTATTCATTCTGTTTAGCATATTATCAGTTGCCATCTATTT 
GTTTTAACTGATTACTTGAATCTGATTAAACATCACAGAAATGGGCTTTGATAAGA 

3 0 ACAATATTGAATAAGAAATTTTAAATAACAAAACAGCTTATAGAAAAATTCAGCAT 

AACTTTTCCATCACCTTCACCACCCTTGCCTTTTATTATCCTGTCCTGTATCACTG 
CTTTCTGTTAGCAGTGTTGTGTGAGTTAGGATTTGGGCAGGAAAGCAAAAGCAACC 
ACCCGTCATTTTCCCAGAATGAAGGGTTTGACGTAGGATGTAGACTTTGTATAGTA 
GTTGGGAGAGCTGTGGGAGTGAAGGTCAGGGATGTCACCTACAGAAGTCAGGGAAT 
3 5 CTGCCACCAGAGATCCTGCATCAGAAACAGCCAACAGCGTGCTTCTGAAGAACTAG 
TGGGGAAGTGGCTATAATTCTTAGGAATCCCAGCAAGTCCGCACCACTGTCTCAGT 
CTACAGCAGTGGAGAAAGGGGTTTCCAGGAGCTCTCTGGAAAGTTCCTGCCCACAC 
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TTTGCAACAATCTTCAGAGGATAATGGGCTTCTCTTCCAGCTTCCACACCCAA.CAA 
GAGTGCCTTTCATCGGCCAACTCTAACCTGGAACCCTATGGCAGAGGGGATTTAGG 
AGACAGTTTGTNATGTCTGTGGAATGCAAATGAANANGTANCAATGCTTANTTGAC 
AGCGGNCATACACAAATNTNGAAA 

5 

Sequence ID 4 09 
GATCCGTNGACT 

Sequence ID 410 

1 0 CTCTTCCCAGCCCCTGAGCCCAGCCCCTTCCCAAGTGGTGCCAGACAAAAAACTAC 
ATGGCCCTTTCGTGTCTTGGGGGTGGAAAGGGAGGGATGAATTGGGGTGATAGAAC 
CCTGGTGAATTCAGAGTAATCTTTCTTTAGAAAACTGGTGTTTTCTAAAGAAACAG 
GATAGGAGTTTAGAGAAGGCACCAAAGCTTTCACTTTGGTTTGGCACCAGTTTCTA 
ACCATCTGTTTTTTCTACCCTAGCTATCTTTTATTGGTAAAATATAAATGTATAAT 

15 TATGTTTGTAGAGCTTTACCAAGGAGTTTCCCTCCTTTTTTGTTTGTTGATTAGCA 
AATTTTTGATTCTCCATTTTCCAAAAGTAAGAGACTCCAGCATGGCCTTCTGTTTG 
CCCCGCAGTAAAGTAACTTCCATATAAAATGGTATTTGAAAGTGAGAGTTCATGAC 
AACAGACCGTTTTCCATTTCATCTGTATTTTATCTCCGTGACTCCACTTGTGGGTT 
T 

20 

Sequence ID - 411 nt : 505 

TGGAGCTGAAAAATTCCTATTACCTAGGGGCATCACAACGCATTGCATTTCGCCCG 
TGTTTGGGATGATGCTGGTGTAAACCTACTATGCTGCCAGTCATGTAAAAGTATAG 
CACACACAATTAGTAGGTAATGCTTGCAAATAATAATGAAAGACTCTGCTACTGGT 

25 TTATGTATTTACTATGCTATACTTTTTGTCATTACTTTAGAGTGTACTCCTACTTT 
TTTTTTTTTTTTTTTTGAGATGGAGTTTCACTCTTGTCCTGTAGGCTGGAGCGAAN 
TGGCGCGATCTCGGCTTACTGCAACCTCCACCTCCTGGGTTCAAGCGATTCTCCTG 
CCTCANCTTCCCAGAGTAGCTGAGATTACAGGCATGCACCGCCACGCACGGGTAAT 
TTTGTATTTTTGGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGGTCACCAACT 

30 CCTGACCTCAGGTGACCCGCCTCCTCACCTCCAGAGTGTTGGGATTACAGGNGTGA 
G 

Sequence ID 412 

ATAAAAATTAGCTGGGGGTGATGGGCCCTGTACCCCAGCTACTCGGGAGGTGAGGT 
3 5 AGGAGAATCACTTGAACCCGGGAGATGGAGGTTGCAGTGAGCCAAGATCGTGCCAC 
TGCACTCCAGCCTGTGTGACAGAACAAGACTCTGTCTCAAAAAAAAATAATAATAA 
TAATAATAATAAAAAGGAATAACATAGCTAGGAATAAATTTAATCAAAGAGGTGAA 
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AGACTTATACACTTAAAACTACAAAAAAAAAATCACTGAAGGAATTATAGACCCAA 
ATAAAAATAAATAAAAAGACATTCTGTGTTTTAGGGAAAGAAGACTTAATATTGTT 
AAGATGTCAATACTACCCAAAGTGATCTACAGATTCAACATAATGCCTATCAAAAT 
TCCAACAGCCTACTTTGTAGAAATGGAAAAGCCAATTTTCAAATTCAGATGGAATT 
5 GCGAGGGGTTCTGAATAACAAAACACAATCTTGGGGA 

TCAAAGAACTC^CACTTCTCTATTTATAATTTACTACAAAGTTATAGNATCAAAGT 
CGACGCGCCGCGATCCGGGC 

Sequence ID 413 

1 0 CACAGt ACTCCATTTTGGGGTCCAAACTGTAATGCTCAAAATAATAAATGCTTACA 
CGAAAATTATTTATTGAGAATATTCATATAAAAATTACCTAAAGCAAAGTAAAAAA 
AGTAAAATCAAGGTGGTATATTTGAAGTGAATGGTGATTGGAAATTTTTAGCTGTA 
ACAAAAAGAAAGAAAACAACTTTTTTTAAAGCCTCATTCTCTTTTCTTTCA2\AATG 
TACCTTATTCCCACACACTCTTGGGCTGACCTTTATTTTATCAATAAGCTCAATAT 

1 5 TACTTTGTTTAAAATAAGATGCTTCAGCAAAAGTCATTCTCTCTTTAACCATATAA 
TTTAAAAACTCCTCTTCACGATTGATAGCAAAATCAGAAACGTTAGGGCACCAGTG 
AGTTGAAAAAACTGGTCTTAAGTTGGAAAAACTATTATTAATAATATTATCCTATC 
CATCCATATCTATTGAAATTGTCAGGTCCATAATTTCATTTTAATTAATTATAGGA 
AAGAAGAAAAGATAATACCCATTTGTTCTAT 

20 

Sequence ID 414 

CTCAGACTCTTTCTGCCCTAATGGCCATTACTATCCAGTCTGTATTGCTACAAGGG 
ACCCACTGGTACCCCTTTTAGATTCTATCAAAAGGAACAGGGTTTTCCTAGAGGCA 
GGCAGCCTGGTGGTATGGCACAGCAGAAGCTTACTGCTAATGAAATGGGAACCTCC 

2 5 CCCTCCCTTGTGGTTTCAGCACAGAACCTGAATGCCAGGAAAAATTCCTGGGCCAA 

GAAGCTAAAGCTAAAGAAACCTTCCTTTTTTCAACGTTTTTTTTTCTTTCAAACTG 
TAGGGTCACTTTTGATTGAGGCAAAGGGGTCCTACTGTAAGTGGAAAAGACTCACT 
CCCCTAACATAAGTTTTCACTGTGGTGGGATGGTGCCGCCCGATATGCTTGATATG 
CTTTTCCTTCCACATGTTAAGCTAGGAAACCTAACAGGATGTCAGCAGGGCAGTTA 

3 0 ACTCTGGACTCANAGCCCTCAAGGGCATGTGGCANAACCTCATGGCATNCAAGACC 

A 

Sequence ID - 415 nt : 596 

GTATAATTGATTCTTTTGAACCTAAAGTATAAGACTTCACGATTAGAAAAAAATTA 
3 5 TCCAAAGACTAATGTAATTAAGTGAGGAAAAGGTGCTGGAGGAACTGGATAACCAC 
ATGGAAATGTATGAACCATGACCTCTATGTCACATACTATATATAAAACTTAATTT 
GAGGTGTATCACAGAGCTAACTGTGGGGGCTAAAACGTTGAAGCCTTTGGATGGCC 
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GCAC2VAGAGATGTCTGCATTCATAACCTTGGGGAGGGTATGAACATTTCTTGGTAA 
CATGCAAAAAGCACTAACTGTAAAAGAGAACAGTTGGTCAGTTGAATTTCATGAAA 
CATTGTAAACTTCTGCTAAACAACTGACACCATTAAGAATGTGGAAAAAGGCTGGG 
CACAGTGGCTCATGCCTATAATCCCAGCATTTTGGGAGGCCGGGGCGGGAGAATCA 
CTTGAGGCCAGGAGTTTGAAACCAGCCTGGGCAACATGGCAAGACCCCGACTCTAC 
AAAAATATTTTTAAAAATTAGTTGGGTGTGGTGATGCACTCCTGTAGTCCTAGCTG 
CCAGGANGCTAAGGNGGAAGGATCACTTAACCCTGG 

Sequence ID 416 

CTGGTGGCGGCGGTCGTGCGGACGCAAACATGCAGATCTTTGTGAAGACCCTCACT 
GGCAAAACCATCACCCTTGAGGTCGAGCCCAGTGACACCATTGAGAATGTCAAAGC 
CAAAATTCAAGACAAGGAGGGTATCCCACCTGACCAGCAGCGTCTGATATTTGCCG 
GCAAACAGCTGGAGGATGGCCGCACTCTCTCAGACTACAACATCCAGAAAGAGTCC 
ACCCTGCACCTGGTGTTGCGCCTGCGAGGTGGCATTATTGAGCCTTCTCTCCGCCA 
GCTTGCCCAGAAATACAACTGCGACAAGATGATCTGCCGCAAGTGCTATGCTCGCC 
TTCACCCTCGTGCTGTCAACTGCCGCAAGAAGAAGTGTGGTCACACCAACAACCTG 
CGTCCCAAGAAGAAGGTCAAATAAGGTTGTTCTTTCCTTGAAGGGCAGCCTCCTGC 
CCAGGCCCCGTGGCCCTGGAGCCTCAATAAAGTGTCCCTTTCATTGACTGGAGCAG 

Sequence ID 417 

GCAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCCGCAGATAAGTTT 

TTTTCTCTTTGAAAGATAGAGATTAATACAACTACTTAAAAAATATAGTCAATAGG 

TTACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAGATT 

TTAAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAAAAGG 

TTTCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATGTAT 

TTAAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAGAAGG 

GCAATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTTAAA 

AGTTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAACCGA 

AGGTGATTAAAAGACCTTGAAATCCATGACGCAGGGAGAATTGCGTCATTTAAAGC 

CTAGTTAACGCATTTACTAAACGCAGACCAAAATGGAAAGATTAATTGGGAGTGGT 
AGGA 

Sequence ID 418 

CCCGGAATCGCGGCCGCGTCGACGGGAGGTGATAGCATTGCTTTCGTGTAAATTAT 
GTAATGCAAAATTTTTTTAATCTTCGCCTTAATACTTTTTTATTTTGTTTTATTTT 
GAATGATGAGCCTTCGTGCCCCCCCTTCCCCCTTTTTTGTCCCCCAACTTGAGATG 
TATGAAGGCTTTTGGTCTCCCTGGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTA 
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CACTGACTTGAGACCAGTTGAATAAAAGTGCACACCTTATAAAAAA 
Sequence ID 419 

CCCGGAATCGCGGCCGCGTCGACGGGAGGTGATAGCATTGCTTTCGTGTAAATTAT 
GTAATGCAAAATTTTTTTAATCTTCGCCTTAATACTTTTTTATTTTGTTTTATTTT 
GAATGATGAGCCTTCGTGCCCCCCCTTCCCCCTTTTTTGTCCCCCAACTTGAGATG 
TATGAAGGCTTTTGGTCTCCCTGGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTA 
CACTGACTTGAGACCAGTTGAATAAAAGTGCACACCTTATAAAA 

Sequence ID 420 

CTTCATTTGAAATGGTTGAATCTGCTGTGTAATAAAGTGGTTCAACCATGATTAGG 
AACTGAAATTTAGTAGAAGAGGGAAAAGGAGTTAATGTAACAAATTATTTTAGCTA 
CAAACCCCGGTAATAGAGCACTTGGGGGATGGGATGGGGTGGGTTGGTGAGACAAT 
CAGAATGGTAAATTGATTAAATGCTCCTAACCCTGTAATTTTGTGCATAGAGCACC 
CTATGCTGTGGAAATAACTGTTCTTAGATTTCATTGTAACTGGACTGTTCAGGTTG 
CCCAGAGGGAAAGAACATTCCTAATTCTAATAAAATAAACTTTTATTTTGTTTA 

Sequence ID 421 

TGTCATTGAATCTGCTTGTTACTTAAATGCTAAACTCAATTCTGTAATTCAATAGG 
TGCACCTCTCTGAGAAACATAAGAGACAATGAGGAAAAGGATTCAGCATTCCGTGG 
AATTTGTACCATGATCAGTGTGAATCCCAGTGGCGTAATCCAAGTAAGATGTTCAC 
AAAGATTTGTTTTTAATGTCTAATTAATAAAATTTTAAAGGAAGAAACATTCTAAT 
ACTTTAATTATAAAAAGTTAACTATTTTCAAAGGTATCAAAATACAGTTAAACCTT 
TAAAATGTATATTTCTTAATATCTTGAAATTGTAATGCCTTTTTTTTTTCCTAAAT 
TTTTTTTGTCATGAAATGAGATAGTAACAGCAGATTGGGACAACAAGGTTATATTC 
TTGTCTTGAATCAGGCCATGGCTTCTTTCATCCAAATTTCAGACCTCATTTATTTA 
CTTTGTCCCTGCCTCCCATCCCTGGATATCAGTTTGTGGATATCTACAGTTAATAG 
AGTGACCAAATAGTAGGAATACTGTCTCTCTATTCTGAATAAAATCTTTGAATCAG 
ATTTAGAAATAATGAATAAAATACAAATCAGCCATTGAAATTGCTCTAATTTTGAG 
AGCTTATGATTTATTCATCTTTGGTTTCCAAGTTCAAGTTATATGTAGACATTTTA 
ATT 

Sequence ID 422 

GCTTCCTAGGTGAGGTCACGAGGAAACCTGCTGGCCAAGTGACCTGGCAGGGTGTG 
GCCAGTGTGGCCAGGGCCGCCGAGCCTGCTTTCCTTCCCTGCAGCAGGAACCCTTC 
TGGGGCTGTGATCCTGCGATGGTGCCTGGGTGGGAGTGGGGGTGGGGGGCGGGATG 
GTCTCCCTACCTGCCAGCTTCTTGGTTTGAGGTGAGGACAGCCCCGGAAGCTCANA 
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CTTGGCTCCTGTCCATGTAGTTGGGGCCATGAGCTCTGCAGGGACCTTGGAAAGAN 
AGAGACGGGTGGTGTANGGCANGGGAAGGCATTGTCTTCAAACAGGAAAAAGCTGA 
NAATGGAAACAGGCGAAACTTACCAAGTGTAACATCACCTGGAACTGAAGGAGGGT 

GGGAAGGTTTTAATTATTTTAAAAATAGAGATGGGGTCTCACTATGTTGCCCAGGC 
TGGTCTCAAACTACTGGGCTCAAGTGAACCTCCTTCT 

Sequence ID - 423 n t : 3 87 

TGTTTCTCNAGGGCGAGAGGCTGTCTTANAGCACCATTCTCTGGCCCTNGTCCCAT 
GAGAAGGAACCGCACTCAGGAGCCACACTCTCCCACTNCCCTTGCCCANAAGACTC 
ACAGAGGGCACGGAGCTGGCTGTGGTGAGAGGAGGTCCANCAAATTCCTGTCTGCA 
NAAGGGTTCTGAACACCACCGCCTGGCAGCGTGCTGGAGGAGGGATTCCTCTTTTC 
CTCACAGCAATTCTGACCAGAAACCTGTCAAATCAGGAATGGCTAAAATAAGACCA 
GGGTATGAATGACCATCAGCCACAGTAAAACCAAGGCACAGCTCTCCTGAGCCCAC 
CCAAGCTGCTGTGGCCCAGACTGGTGACATCACCTCAGGGCAAAAAAAAAA 

Sequence ID - 424 nt: 420 

CGCAGAATGGCTCCCGCAAAGAAGGGTGGCGAGAAGAAAAAGGGCCGTTCTGCCAT 

CAACGAAGTGGTAACCCGAGAATACACCATCAACATTCACAAGCGCATCCATGGAG 

TGGGCTTCAAGAAGCGTGCACCTCGGGCACTCAAAGAGATTCGGAAATTTGCCATG 

AAGGAGATGGGAACTCCAGATGTGCGCATTGACACCAGGCTCAACAAAGCTGTCTG 

GGCCAAAGGAATAAGGAATGTGCCATACCGAATCCGTGTGCGGCTGTCCAGAAAAC 

GTAATGAGGATGAAGATTCACCAAATAAGCTATATACTTTGGTTACCTATGTACCT 

GTTACCACTTTCAAAAATCTACAGACAGTCAATGTGGATGAGAACTAATCGCTGAT 

CGTCAGATCAAATAAAGTTATAAAATTG 

Sequence ID 425 

GGAAACTGATGCCAGTCAGAAACTCAGATCAAATGAAGGGGTGAAGAGAACCAGAA 
TTGATCTCTCTGTAGGAGAATATAAATGACTTTTTTAAAGTACATATTTTCTGTGA 
AAGACAGTTTTTTGTTTAATGCAAAAATGTTAACAATGTTTATATCATGTAGAAGT 
AAAAGATCGTGAAACAGCACAGAGAACAGTAGTAAGACAGATTGAATTGCACTGTT 
GTAAGATGATGAACTTACAATATTAAGTGAAGGTAGACTGTGATAGATTAAGGATA 
TATATTGTAATCCCTAGAGCAATTGTCAAAGTGGTACAGGTAAAAAGCCAATAGAG 
GTGATAAAATGGAATACTAAAAAATATCAGATGAATAATAAAGAAGACAGGAAATG 
AGGAACAGTGGAACAGAATGAATAAAAAACAAGACCATTAACTTAATCATTAATAA 
TTACTTTAAATGGGTTAAACATTATGGTTATAAGGCAGAGATTTTCAGACTAGATA 
AAAGAGCAAGCTCCACTATATACTGTCTACAAGAGATATACTTTAAAGTGTATATT 
ATATTTAAATATAAAGATTTGGAATAAATAAACCTAA'GAATAAGCTTACTAGGGAA ' 
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GTGAAAGATCTGTACAACAAGAATTACAAAACACTGCTGAACGAAATCATAGGTGA 
CCA 

Sequence ID 426 

GTCCCGGAATCGCGGCCGCGTCGACGTTTCCTCAAAATTTATCTTCCTGTTAATGT 
CAGGCATGTATCTCCTTAGCTTGCCACAAATAACTATATATACCACAGACCTTCCT 
TTGTAGGGCTAACAGTGTTGCATTGTAAGTGGAGGCCTCATAGATACCTGGCCTTT 
TCCTACCTTATTCCAAAGATGGTTGCATCTTATAAATAATGTCATTCTTCAGCAAA 
TGGTATGGAAATGAGATTGTAATGTCATTATTTCCTCTTTAAATAATCAGGACAAC 

CAAAAATAAAATTGTACCTCCTTAATCTTCTACAGAAAGATGGATTTCATTTTCAA 
CATTAAGAGGTAGTTTTAAGAAGCAGTAGAAGTCAGCCTGGGCAGCATGGTGAAAC 
CCCGTCTCTACAAAAAAGTTAGCTGGGCTTAGTAGTTGCAATCCCAGCTACTCTGG 
AGGCTGAGGTTGGAGATCATCTGANCCTGGGGAGGTCNAGGCTGCAATGATACANT 
GAGCCCTGATTGTGCCACTCCACCTGGTTGCAGA 

Sequence ID 427 

TTCCAATCTTCGTGTTCACTTTAAGAACACTCGTGAAACTGCTCAGGCCATCAAGG 
GTATGCATATACGAAAAGCCACGAAGTATCTGAAAGATGTCACTTTACAGAAACAG 
TGTGTACCATTCCGACGTTACAATGGTGGAGTTGGCAGGTGTGCGCAGGCCAAGCA 
ATGGGGCTGGACACAAGGTCGGTGGCCCAAAAAGAGTGCTGAATTTTTGCTGCACA 
TGCTTAAAAACGCAGAGAGTAATGCTGAACTTAAGGGTTTAGATGTAGATTCTCTG 
GTCATTGAGCATATCCAAGTGAACAAAGCACCTAAGATGCGCCGCCGGACCTACAG 
AGCTCATGGTCGGATTAACCCATACATGAGCTCTCCCTGCCACATTGAGATGATCC 
TTACGGAAAAGGAACAGATTGTTCCTAAACCAGAAGAGGAGGTTGCCCAGAAGAAA 
AAGATATCCCAGAAGAAACTGAAGAAACAAAAACTTATGGCACGGGAGTAAATTCA 
GCATTAAAATAAATGTAATTAAAAGG 



Sequence ID 428 

TGCAGGATCCGTCGACTCTAGATAACATGGCTAGAAAAGAGAATGAAAAAGTTGGA 
ATTTTTAATTGCCATGGTATGGGGGGTAATCAGGTTTTCTCTTATACTGCCAACAA 
AGAAATTAGAACAGATGACCTTTGCTTGGATGTTTCCAAACTTAATGGCCCAGTTA 
CAATGCTCAAATGCCACCACCTAAAAGGCAACCAACTCTGGGAGTATGACCCAGTG 
AAATTAACCCTGCAGCATGTGAACAGTAATCAGTGCCTGGATAAAGCCACAGAAGA 
GGATAGCCAGGTGCCCAGCATTAGAGACTGCAATGGAAGTCGGTCCCAGCAGTGGC 
TTCTTCGAAACGTCACCCTGCCAGAAATATTCTGAGACCAAATTT 
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Sequence ID - 429 n t : 535 

CACAGTACTCCATTTTGGGGTCCAAACTGTAATGCTCAAAATAATAAATGCTTACA 
CGAAAATTATTTATTGAGAATATTCATATAAAAATTACCTAAAGCAAAGTAAAAAA 
AGTAAAATCAAGGTGGTATATTTGAAGTGAATGGTGATTGGAAATTTTTAGCTGTA 
ACAAAAAGAAAGAAAACAACTTTTTTTAAAGCCTCATTCTCTTTTCTTTCAAAATG 
TACCTTATTCCCACACACTCTTGGGCTGACCTTTATTTTATCAATAAGCTCAATAT 
TACTTTGTTTAAAATAAGATGCTTCAGCAAAAGTCATTCTCTCTTTAACCATATAA 
TTTAAAAACTCCTCTTCACGATTGATAGCAAAATCAGAAACGTTAGGGCACCAGTG 
AGTTGAAAAAACTGGTCTTAAGTTGGAAAAACTATTATTAATAATATTATCCTATC 
CATCCATATCTATTGAAATTGTCAGGTCCATAATTTCATTTTAATTAATTATAGGA 
AAGAAGAAAAGATAATACCCATTTGTTCTAT 



Sequence ID 43 0 

CAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAAGTTTTT 

TTCTCTTTGAAAGATAGAGATTAATACAACTACTTAAAAAATATAGTCAATAGGTT 

ACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAGATTTT 

AAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAAAAGGTT 

TCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATGTATTT 

AAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAGAAGGGC 

AATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTTAAAAG 

TTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAACCGAAG 

GTGATTAAAAGACCTTGAAATCCATGACGCAGGGAGAATTGCGTCATTTAAAGCCT 

AGTTAACGCATTTACTAAACGCAGACGAAAATGGAAAGATTAATTGGGAGTGGTAG 

GATGAAACAATTTGGAGAAGATAGAAGTTTGAAGTGGAAAACTGGAAGACAGAAGT 
ACC 

Sequence ID 431 

CGCTGGGTGCCTGCAGCGCCTCCCTTGTCTCATATGGTGTGTCCAGCACTCTATTG 
TTGTAAACTGTTGNTTTGNCTGACCTAAATTNTCTTTACTAAACANATTTAATAGT 
TNAAAAAAAAAAAANANCA 

Sequence ID 432 

TTTTAAAGTCATCTCTATAGGAAGGTGCTGGGCAGGGATCCCAGAGAAAGAAAGGG 
TCCAAGACTCCATTAACTGCCCTGGATGAAGGGCACTGCTACAGCAGCTAGTACCA 
GAGACTCTCCTATCTCACGGTTGAGGCAGACCCAGGATAGAATAGAGAATAAAAGG 
AATGCTTATAGGAAACAATTTTGTATGGAATGCTAGATGGCCAAGCCTCAGCCTTT 
GGTCCAGTGCAACCCTTGCCTCGCTTGTCAACAGTGAAAAATTAGTTTGGTTAGAA 
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GAACCATCTGGAAACACACCAGCTTCTGCTACCTTCATGCTCATTGTTAAAAAAAG 
ATTAACCAGTGTGAACATTCTGATCTGTTAATTCCAGGGACTGTTTTCTTTCCAAT 
GGACTGTTTGTTGGTAGAATAACCCCCAAAAGCTCAAAGCTAAAA.TGCATCATCAG 
TCCTAGTCGGCAGTTCCTTAAGAATGGACTGGCGGCGTGGTTGAGCTGATATGGAA 
AAGCTGCACCTTCCTGCAGAAGATCAACTGACCTGCTATCCCACCCCAAATTTCAA 
CCTGAGGTATATTTCAATGAAGGCAGGTAGCTGTGCTTCTCAGAGCA 

Sequence ID 433 

TCCCGGAATCGCGGCCGCGTCGACCCGCCGCCGAGGATTCAGCAGCCTCCCCCTTG 

AGCCCCCTCGCTTCCCGACGTTCCGTTCCCCCCTGCCCGCCTTCTCCCGCCACCGC 

CGCCGCCGCCTTCCGCAGGCCGTTTCCACCGAGGAAAAGGAATCGTATCGTATGTC 

CGCTATCCAGAACCTCCACTCTTTCGACCCCTTTGCTGATGCAAGTAAGGGTGATG 

ACCTGCTTCCTGCTGGCACTGAGGATTATATCCATATAAGAATTCAACAGAGAAAC 

GGCAGGAAGACCCTTACTACTGTCCAAGGGATCGCTGATGATTACGATAAAAAGAA 

ACTAGTGAAGGCGTTTAAGAAAAAGTTTGCCTGCAATGGTACTGTAATTGAGCATC 

CGGAATATGGAGAAGTAATTCAGCTACAGGGTGACCAACGCAAGAACATATGCCAG 

TTCCTCGTAGAGATTGGACTGGCTAAGGACGATCAGCTGAAGGTTCATGGGTTTTA 

AGTGCTTGTGGCTCACTGAAGCTTAAGTGAGGATTTCCTTGCAATGAGTAGAATTT 

CCCTTCCTCCCTTGTCACAGGTTTAAAAACCTCACAGCTTGTATAATGTAACCATT 

TGGGGTCCGCTTTTAACTTGGACTAGTGTAACTNCTTCATGCAATAAACTGAAAAG 
ACCATGCTGCTANTC 



Sequence ID 434 

TTCGGACGCAAGAAGACAGCGACAGCTGTGGCGCACTGCAAACGCGGCAATGGTCT 

CATCAAGGTGAACGGGCGGCCCCTGGAGATGATTGAGCCGCGCACGCTACAGTACA 

AGCTGCTGGAGCCAGTTCTGCTTCTCGGCAAGGAGCGATTTGCTGGTGTAGACATC 

CGTGTCCGTGTAAAGGGTGGTGGTCACGTGGCCCANATTTATGCTATCCGTCAGTC 

CATCTCCAAAGCCCTGGTGGCCTATTACCANAAATATGTGGATGAGGCTTCCAAGA 

AGGAGATCAAAGACATCCTCATCCAGTATGACCGGACCCTGCTGGTAGCTGACCCT 

CGTCGCTGCGAGTCCAAAAAGTTTGGAGGCCCTGGTGCCCGCGCTCGCTACCAGAA 

ATCCTACCGATAAGCCCATCGTGACTCAAAACTCACTTGTATAATAAACAGTTTTT 
GAGGGATTTTAAAA 

Sequence ID 43 5 

CTGCAATGTGCAATAGTTGCACCACTGCACTCCAGCCTGGGTGACAGAGTGAGAAC 
CTATCTCTTAAAAAAAAAAAAAAAAAAAGGAAGAAGAGACATGAGAGGGCCCAAGT 

cacttgctcactcactttccgtgtacatgtaccaagaaAaggccatgtgggaaaga 
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GCAAGAAGGCAGCCGCCTTCAAGACAGGAAGAGAGCCCTCACCAGAAACTGAGCCA 

GAACCTTGGAATTCCAGCCTCCANAACTGTGAGAAAAGAATTTTCTGTTGTTTCAG 

TCCCCCACACTATGGCATTTTGTTACGGCAGCCTGAGCTAATACTCCTACTTTGTC 

CTGCATTTACTTGGTCTTCCAGTTAGTTTTTTAGACTTTGGGAATCAGAGCAGTCA 

GTTGTCAGATTTTAGCTTACAGTTGTCCTACCTGTGCAACTGAAATTTCTTCCATT 

TTAAACCAGAGCAGAGTTTTAGAGTCAAAAGAAACCAGATCTTTTAGTGCAGAAGC 
TTTCCACTGTATTANAAGTGAGGAAGTTGGT 

Sequence ID 43 6 

AAAAAAACTCCAGAGAAGTTTATAGAAAGAGATGACATGTAAACCCTGCTGAAAAA 

TAGTTTCATTTGTTAGAATATAATTGTCTTCCACTAAAAAAAGAAAAAAAAAAGCA 

TTTAAGGCTCTAAGATCTCTTGAAGTACCACTTTTCCTGAATCCCAGAGTTTTTAT 

GTGCATTATTTTTATGCGTTTGTAGTTTGATATGTTGTATTTATAAGTAGTTTTAG 

CTTTCCATTATGAATTCTTCTTTGACCCATGAGTTATTTAGGTAAGTGTTTAAAAA 

TTTACAATAGTTTATATATGCAAATATTATGTTGTTAGAGTTGGTTTTCATGTCAT 

TTTTACATATACAGGGGCAGTTTCCCCAACTAAATTGTATATTCCTTAAAGCAGCA 

CTCTTAAATTTTATTTCTGTGTCAATTTCTTGNCTGTGTTTCCTGGCATGGAATAC 

ATGGCATAAAATTTGTTATGTAATTAAATGAAATATTATTATACTTTCTATTTTTT 
AGAAAAAA 



Sequence ID - 438 nt . 577 

GTCGACAGGGATGACATAACTATTAGTGGCAGGTTAGTTGTTGGTCACTTTCAACT 

CTGGGTTCAAGCGATTCTCCTACCTCAGCCTCCCGAGTAGCTGGGATTACAGGCAT 

GCACCGCCACACCTAATTTTCTATTCTTAGTAGAGACGGGGTTTCTCCCTGTTGGT 

CAGGCTGGTCTCGAACTCCCGACCTCAGGTGATCTGCCTGCCTCAGTCTCCCAAAG 

TCCTGGAACCACAGACATGAGCCACCACGCCTGGCCCCTTTTAAAATATTTCTGCT 

CATTGATGATGCACCCAGTCACCCAAGTGCTCTGATGGAGATGTATAAGGAGATGA 

ATGCTGTTTTCATGGCTGCTAATACAACATTCATTCTGCAACCCCCAAATCAAGAA 

GTAATTTTGACTTTCAAGTCTTATTATTTAAGAAATATATTTTGCAAGACTATAGC 

TGCCATAGACCGTGATTCCTCTGATGGATCAGACAAACTAAAATGAAAACCTCCTG 

CAACGTATTCATCATTCTAGATCCCTGAGGAATCGCCACACTGACTTNCACAATGG 
GTGAACTGGGTTACAGT 

Sequence ID - 441 nt . 552 

AAACAAAATTATTCTCTGAGAGGGAAAGGACATTTGAGGGAAACATCAAATTTCCC 
CATAAATAAATGAATGGAGTTTGCAGGAAGGTGAGGGTGAGCAGAGATGTGTGTGG 
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ACATCTCTGACCATCCATCGCTGTATTCAAATGGATTGTTTTATTCCATTCTGGTC 
TCAGGCATGACCACGTCCAGTGAAGACATTTGAGGCAGCACATCTCAGGACCCAGG 
CAA.TAGACTGGCCCCAACTCAGGCTGGACTAA.GGTGTGATTAATTCTTTGTTTTTT 
GTGTGGAACAGCTCACCTTGTCAGACAGCCTCAGGGCATCTCTGAGACACAGGGGC 
AGAAAATGACATTCATCTTTTGAGTCCTCATCCATGGAGTGCTGTGTTTGGGGGGC 
TGCATCTGCTGAAGCGAGAACCCCATTCTGCCACCCCACCAGGATGCCCATTCTCC 
AGGACTTCTCCAACTTACTATTAGACTAAACCAGAACAAGCAACAAACTGTATTTA 
TGCAAGCAAAATTGATGAGAAAATTATATTCAAATAAAGCAAAAATTA 

Sequence ID - 442 n t : 606 

TCGTGCCACTGCACTCCAGCCTGGACGACAGAGTGAGACTCCATCTCAAAATAAAT 
AAATAAATAAATAAATAAATAAATAAATAAAAAAATAAAAAATACTTCTGCTATGA 
AAAACCTAGTTGGTATTTTTGCTTATTTAATACTATAGAAATATGGTGATCTCATC 
TTTAATAGAGTGCTTTTAAGGTCCCCAGTGATAATCTCCTAAAATCATGAACTTTA 
AGAATTTATAATGTTAATATGAGGAAATGAAATCTGGATTATCTCACCACATATTA 
TATAATTCATTAGTGACAGAGCAAGAACTCCAGGTCACCTGTCTATTCCATGTTTT 
TCCTATCTGCCTTTAAATGTTGAGATACTACCCTTATCTCATGTGAATGGAGAAAC 
TGCCTAAAATGCTAAAACTGACTCAGAGGCACCCAGACATAAGTGAAGTGTGATTA 
GAAAATCCTGGTCAGTTGAGTCTTAGCCAAATGTGTACCTACTGTGTCTGCCTCTA 
TCAAGTCAATGAAAACATGATCTGAGAACTGTAAGTCCATTTATGGAAAGGGTTGA 
TTTANAGATATTTTGAACTTNCAGTGATGAGCCCCTTCTCAAATAG 

Sequence ID 446 

CGGACTCCTGTGCTAATTGTCAGCTTACATATCATTGTATAGAGACTGTTTATTCT 

TCATAAAGTTCTGCGTTTGGCATCTTCACTCTTTCCAAAATGTATCTGTACATCAN 

AAATGTCACTATTCCAAGTGTCTTTTTAGTGTGGCTTTAGTATGGCTTCCTTTTAA 

TATTGNACATACATTGNATCTTTGTTTTATGGNAATAAGTAATAAAAATGTAGACT 

TCATATTTTGTACAAAATGTCCTATGTACAGAATAAAAAAGTTCATAGAAACAGCC 
NANAA 

Sequence ID 447 

AGGCCGAGGCAGGCAGATCNCNTGAGGTCAAGAGTTTGAGACCAGCNTAGCTAACA 
TGGTGAAACCCCATCTCTACAAAAATATA-AAAATTAGCCTGG-GTGGTGATGGGC 
ACCTGTAACCCCAGCTACTCGGGAGGCTGAGGTAGGAGAATCACTTGAACCCGGGA 
GATGGAGGTTGCAGTGAGCCAAGATCGTGCCACTGCACTCCAGCCTGTGTGACAGA 
ACAAGACTCTGTCTCAAAAAAAAATAATAATAATAATAATAATAAAAAGGAATAAC 
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ATAGCTAGGAATAAATTTAATCAAAGAGGTGAAAGACTTATACACTTAAAACTACA 

AAAAAAAAATCACTGAAGGAATTATAGACCCAAATA?y\AATAAATAAAAAGACATT 

CTGTGTTTTAGGGAAAGAAGACTTAATATTGTTAAGATGTCAATACTACCCAAAGT 

GATCTACAGATTCAACATAATCCCTATCAAAATTCCAACAGCCTACTTTGTAGAAA 

TGGAAAAGCCAATTTTCAAATTCAGATGGAATTGCGAGGGGTTNTGAATAACAAAA 
CACNATCTTGGGGAAAAAAAACAAAAAAC^ 

TTATAAATTTACTACAAAGTTATAGTAATCNAA 

Sequence ID - 448 nt : 329 

TACGCACACGAGAACATGCCTCTCGCAAAGGATCTCCTTCATCCCTCTCCAGAAGA 
GGAGAAGAGGAAACACAAGAAGAAACGCCTGGTGCAGAGCCCCAATTCCTACTTCA 
TGGATGTGAAATGCCCAGGATGCTATAAAATCACCACGGTCTTTAGCCATGCACAA 
ACGGTAGTTTTGTGTGTTGGCTGCTCCACTGTCCTCTGCCAGCCTACAGGAGGAAA 
AGCAAGGCTTACAGAAGGATGTTCCTTCAGGAGGAAGCAGCACTAAAAGCACTCTG 
AGTCAAGATGAGTGGGAAACCATCTCAATAAACACATTTTGGGTTAAAA 

Sequence ID 450 

GAGCAGTGGCATGATCACACCTTACTGCGGCCTCCAACCCCTGAGCTTAAGTGATT 
CTCCCGCATTATCCTCCTGAGTAGCTGAGACTACAGGTGCATGCCACCATACACTA 
CTAAATTTGGGTCGGGTGGTGGTGGTGATTTTTTAATATTTTTGTAGAGACAGGGT 
CTCACTGTGATGCCCAGGCTGGTCTTGAACTCCTGGGCTCAAGCAGTCACCCACCT 
CAGCCTCCCAAAGCACTGGGATTACAGGTGTGAGCCACCACACTGGCCAGCTTTGT 
TTTGTTTTGATGACTAAGCTGCTCTTGCTAAAAGGGCTTCTCTCTGAACTTCCCTA 
CCTTTCTTCTGTTTCCCTGGGCTAGGGCTCCATGTTGGCAGTCCTACTCCCAATTA 
ACCTGGGGCTGTCTGGTTAACCTTTATAAGATCTGCAGTCATTGGGAGACCCGGGG 
ACCAGGAATATTGTTGTTGAGGGAGCTACCCTGGAAAGTGGATGGGTGGCCAAAGG 

Sequence ID 452 

TTTGGCTTTGCCTCTAGGCATTAGATGTTATCTTTGGAGGCATCCTTCTATGAGCA 
TTCATTTTTGGACCAAGCCTGGATTTACAATTCTATTACTGGCCCAGACTTCATTT 
CTATCCAATTTCATTCCACTGTGCTATAGTTTACAACATATAATTTGACTTATAAA 
TAATTCCTGACTATGGGTTTAAAGACTGAAAATGGATCAATAGAAACTTTGAAAAT 
GTTAACATCTTGATTGCTTTTCTCAGTGTAGA2^ATGGACAATGTTTAGCTTAAAAA 
CTGCATGTTTTTAATGAGATACGGGGTTGAAAGACTTATTCCTGGAATTTATTGTT 
GTGGAGAAAGCCTGTTGCTATCTGCCATACCTTGGTTTACTTTGTGCAAAATGAGC 
TTCTTTTTAAGTAATGAGCTCTTTCCATGTTCAGCTTAAATTGCTGTCTTAGACAC 
TTCATCAGGGTTCCCTGCTCTGCCTCATTCCCCCTTTTGCTCACTTGCAGCCTTTG 
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ACATAATCCTGGGAGGCAATTGGCATCATACATATTTTGCTTTGTAATCTCCTGCT 
TTGATTCTGACTGGGACCCAGC 

'Sequence ID - 453 nt : 74 7 

GGATCTAAGACCAGCCTGGCAGCCACCAGATGGTGATTCTAGTCCTGGCTCAGTCA 

GTAATAGGTCACTGACCCCAGAGAAATCAATTCAGCCTCCCCAGGTCCTTGGATTT 

CTTTCTGTGAAAATGAAAGCATAGGTAGGAATTTCCCATGGAACAGCTAGCAGAGG 

AGAAATATTAAAAGTCAGGAGACTCATGCTATAGTTTTCATACTTCATTACAACAA 

TGTTGTTTAGGACAAGTGAGTTAACCTGTTAGCTTCCTCTATATAAAATGGAAAGT 

CATTAAAAACCTACATAGCAGGGTTCTTGTGAAGATCAAGTGATAATGTAGGAAGC 

ATGTACAAATGTCACATTCTGCCGTCACGTAATGGTCCTCACAGCTTGAGGTAGCA 

TTTAGCATGTGTCATGATTTAGTACAAGGGTTGGCAAACTGTTGCTCTTGGATTAA 

GTCTGGCTCATTGCCTGTTTTTCAAAGAAAAAAATTGTATATGTGTGTATATATGT 

TATATATAGGTACACACACATATGTGCTATATATAGCATATATACACACATAATAT 

ATAAACATGTACATATATAGCATTATATATATACCGTGTATAATATCTCCAGTCCT 

CATGACCAGCCATGCTTGTTCATTTACATTTGCATACTCTATGATTGCTTTCATGC 

AACAATGGCAGAGTTGAGTGATTGTTTTGC ACAGANACTGTATGGC CCACTAAAC C 

TAAAATATTAATCTCTGCC 

Sequence ID 454 

CTCCTGCCGGGCTCGTGGCGGCTTCTGTCCGCTCCGCGGAGGGAAGCGCCTTCCCC 
ACAGGACATCAATGCAAGCTTGAATAAGAAAAACAAATTCTTCCTGCTAAGCCATG 
GCATATCAGTTATACAGAAATACTACTTTGGGAAACAGTCTTCAGGAGAGCCTAGA 
TGAGCTCATACAGTCTCAACAGATCACCCCCCAACTTGCCCTTCAAGTTCTACTTC 
AGTTTGATAAGGCTATAAATGCAGCACTGGCTCAGAGGGTCAGGAACAGAGTCAAT 
TTCAGGGGCTCTCTAAATACGTACAGATTCTGCGATAATGTGTGGACTTTTGTACT 
GAATGATGTTGAATTCAGAGAGGTGACAGAACTTATTAAAGTGGATAAAGTGAAAA 
TTGTAGCCTGTGATGGTAAAAATACTGGCTCCAATACTACAGAATGAATAGAAAAA 
ATATGACTTTTTTACACCATCTTCTGTTATTCATTGCTTTTGAAGAGAAGCATAGA 
AGAGACTTTTTATTTATT 

Sequence ID - 458 nt: 682 

TGCCACTGAAGATCCTGGTGTCGCCATGGGCCGCCGCCCCGCCCGTTGTTACCGGT 

ATTGTAAGAACAAGCCGTACCCAAAGTCTCGCTTCTGCCGAGGTGTCCCTGATGCC 

AAGATTCGCATTTTTGACCTGGGGCGGAAAAAGGCAAAAGTGGATGAGTTTCCGCT 

TTGTGGCCACATGGTGTCAGATGAATATGAGCAGCTGTCCTCTGAAGCCCTGGAGG 

CTGCCCGAATTTGTGCCAATAAGTACATGGTAAAAAGTTGTGGCAAAGATGGCTTC 
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CATATCCGGGTGCGGCTCCACCCCTTCCACGTCATCCGCATCAACAAGATGTTGTC 

CTGTGCTGGGGCTGACAGGCTCCAAACAGGCATGCGAGGTGCCTTTGGAAAGCCCC 

AGGGCACTGTGGCCAGGGTTCACATTGGCCAAGTTATCATGTCCATCCGCACCAAG 

CTGCAGAACAAGdAGCATGTGATTGAGGCCCTGCGC^GGGCCAAGTTC^AGTTTCT 

GGCCGCAGAAGATCCACATCTCAAAGAAGTGGGGCTTCACCAAGTTCAATGCTGAT 

GAATTTGAAGACATGGTGGCTGAAAAGCGGCTCATCCCANATGGCTGTGGGGTCAA 

GTACATCCCCAATCGTGGCCCTCTGGACAAGTGGCGGCCCTGCACTCATGAAGGCT 
TTCAATGTGC 

Sequence ID 459 

TCCCGGAATCGCGGCCGCGTCGACCTTGTCCTTGAGCGTCAACCTTCTTTCCCTGA 
AGTGGCTGGGGTTCCTGTTTCCTTCTTTGATTGACAACTTGTGTTAACCCTCGCAC 
ATCTCTGGGCCAATTTTTGCTTGTAAGTCTTTCCGGAGACCCCTGGAATTTAAATC 
ATTAGCACCGCGCCCTTCCCCGAAGAGTCTTCGAAGGGTTGCCGCTTTTCGGTGGC 
GCAGTTCTCGCGAGAAGGTGACTTTCTTTCTCGGTATTTCCTGGTTTCCAGAATCC 
TTAGCGCGAGGCGGAAAAAATATTTCTCCCAGCTTGTGTTGATGCCGCGATTTTGA 
CTGAGACTTCTTCCCACGATTTCTGTTTTTGCTTCTCCAAGGAAAATGGCAGCTCC 
CGAGCAGCCGCTTGCGATATCAAGGGGATGCACGAGCTCCTCCTCGCTTTCCCCGC 
CTCGGGGCGACCGAACCCTTCTGGTCAGGCACCTGCCGGCTGAGCTTACTGCTGAG 
GAGAAAGAGGACTTGCTGAAGTACTTCGGGGCTCAGTCTGTGCGGGTCCTGTCAGA 
TAAGGGGCGACTGAAACATACAGCTTTTGCCACATTCCCTAATGAAAAAGCAGCTN 
TAAAGGCATTGACAAACTNCATCAACTGAAACTTTTAGTCATACTTTAATCG 

Sequence ID - 460 nt : 536 

CAGAGAT CAAAAT AGGCCTTACACAGTGCGACGCGAATTT AAAAGATTAC CC CATT 

CAGGTGTATGGATTTTGCAGTATTAAAGATGCTGCCTGGAATAGGTCATTATCTTC 

TCCAAGTACTCTGTTAAGTCAATGAGTCACATAGAGTATAAGGTTTATTATCTGCT 

TTTCTTTCATTAAATAAATCTTTATTGAATTTCTACTACATTAAAAAACCAAACCA 

AAACAAAACAAACAAAAAAAACACTTCCCTGAGCCATAAAGGAGAAGGTAGTTTTG 

ACTGGAACCTTGAAGGATGGGTAAACTTTCAGCAGATAAAGATTGAGAGAAGACCT 

TCCAGGTAGAGAAAGCAGTGTGGGCACAGGCAAAGATGGAAGAACACACGTGGCTG 

TGGGAAACACAGCTAGAAGCCAGTGCGGATAGAGAGTAGGCTATGATGTGCAAAGG 

TTANACACTGGGAGAGACAGGTCCATGAGAGTAGCTTGGACTAACACAGGGAGGGT 

TTGGAATCCCAACTGGGGAACCTANAAATCAA 

Sequence ID 4 61 

TAGGAGGCTTATTCACTGATTTCCCCTATTCTCAGGCTACACCCTAGACCAAACCT 
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ACGCCAAAATCCATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCA 
CAACACTTTCTCGGCCTATCCGGAATGCCCCGACGTTACTCGGACTACCCCGATGC 
ATACACCACATGAAACATCCTATCATCTGTAGGCTCATTCATTTCTCTAACAGCAG 
TAATATTAATAATTTTCATGATTTGAGAAGCCTTCGCTTCGAAGCGAAAAGTCCTA 
5 ATAGTAGAAGAACCCTCCATAAACCTGGAGTGACTATATGGATGCCCCCCACCCTA 
CCACACATTCGAAGAACCCGTATACATAAAAT 

Sequence ID 462 

TCTTTATCAAGTTGAGAAAGTTCCTCCCCTCTATTCCTAGTTTGCTAAGAGTCCTT 
10 CTATCCTATTTCTTAATGGTTTAGTAGATGACTCTGTGGTACTTTGAAGGTTGTTT 
GCAGAATTTCCATGCCATAGGCAATTTACCTTTCCTTGACATTTGAAGGATTGATG 
TTGGTGCCAAGTATAGAATCTTCACAGAGTCCTCCTGTAGCTTCTAAAGGTTTAGC 
TTGAAAATGTTAATTGCTTAACGCTAGTAAGTGAGTGAAAAAGCTGGGGATAAATT 
TTGTATCTTGCTTATATTTCAGTTCCCACCTCTGTCCNGACNAAACCCCCATATAT 
15 AA 

Sequence ID 463 

TAGTTTACATATCCCAACCTTTAAAAATATTCCTCTTATTAGCTTTATATTCACTT 
TATAGAAGTTGAGTTTTAATTAAAATTCTTGGCATCCTGAAGTATGTCACATAGCA 

2 0 TGTGCTCCTTATAAATATGTTGATATCTCAGAAGACAGCATCCCGGTTTTCATTTT 

ATAAAGTACCATACTTAAGAATGCTGTAATACTTATCTTTTATAACATGTTTCCTT 
CGCTTTGCTTGNCTTTTATGNCATCAGTTTTAACTGTTTACTTCATTTAACAGNTT 
ACATCATNCAACAGTTTACTTCATTAAACAGTAGGTGGAAAAATAGATGCCAGTCT 
ATGAAAATCTTCCCATCTATATCAAAATACTTTCAAGGATATACTTT 

25 

Sequence ID - 464 nt : 615 

CGACTTTCAACCATCAAGTGAGGAATACCTTCACATAACTGAGCCTCCCTCTTTAT 
CTCCTGACACAAAATTAGAACCTTCAGAAGATGATGGTAAACCTGAGTTATTAGAA 
GAAATGGAAGCTTCTCCCACAGAACTTATTGCTGTGGAAGGAACTGAGATTCTCCA 

3 0 AGATTTCCAAAACAAAACCTATGGTCAAGTTTCTGGAGAAGCAATCAAGATGTTTC 

CCACCATTAAAACACCTGAGGCTGGAACTGTTATTACAACTGCCGATGAAATTGAA 
TTAGAAGGTGCTACACAGTGGCCACACTCTACTTCTGCTTCTGCCACCTATGGGGT 
CGAGGCAGGTGTGGTGCCTTGGCTAAGTCCACAGACTTCTGAGAGGCCCACGCTTT 
CTTCTTCTCCAGAAATAAACCCTGAAACTCAAGCAGCTTTAATCAGAGGGCAGGAT 
3 5 TCCACGATAGCAGCATCAGAACAGCAAGTGGCAGCGAGAATTCTTGATTCCAATGA 
TCAGGCAACAGTAAACCCTGTGGAATTTAATACTGAGGGTGCAACACCCCATTTTC 
CCTTCTGGAGACTTCTAATGAAACANATTTCCTGATTGGCATTAATGAANAGTCA 
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Sequence ID 469 

GATTTTTAAAAATACATATAGCAAAAATATTACAGGGTCAGGGGAGACAATTAGAA 
TGATATAATTCAAAGTGGATTAAAAAAAAAA.CTGTCACCCAGAATACAATACCCAG 
CAAAGTTGTCCTTCATAAATGAAAGAAAAATNAAATCTTTNCCNAACNA 

Sequence ID 471 
TCCCGGGAATCTGCAGGATCCGTCGACT 

Sequence ID 472 

GACAGTGCCCAGGGCTCTGATATGTCTNTCACANCTTGNAAAGTGTGAGACAGCTG 

CCTTGTGTGGGACTGAAAGGCAAGATTTGTTCCTGCCCTTCCCTTTGTGACTTGAA 

GAACCCTGACTTTGTTTCTGCAAAGGCACCTGCATGTGTCTGTGTTCTTGTAGGCA 

TAATGTGAGGAGGTGGGGANACCACCCCACCCCCATGTCCACCATGACCCTCTTNC 
CACNCTNACCTGTGCTCCCTCCCCAATCATNTTT 

Sequence ID - 473 nt . 694 

TGGGCTTTGGGCTGGCTGCAGTCTGTCTGAGGGCGGCCGAAGTGGCTGGCTCATTT 

AAGATGAGGCTTCTGCTGCTTCTCCTAGNGGCGGCGTCTGCGATGGTCCGGAGCGA 

GGCCTCGGCCAATCTGGGCGGCGTGCCCAGCAAGAGATTAAAGATGCAGTACGCCA 

CGGGGCCGCTGCTCAAGTTCCAGATTTGTGTTTCCTGAGGTTATAGGCGGGTGTTT 

GAGGAGTACATGCGGGTTATTAGCCAGCGGTACCCAGACATCCGCATTGAAGGAGA 

GAATTACCTCCCTCAACCAATATATAGACACATAGCATCTTTCCTGTCAGTCTTCA 

AACTAGTATTAATAGGCTTAATAATTGTTGGCAAGGATCCTTTTGCTTTCTTTGGC 

ATGCAAGCTCCTAGCATCTGGCAGTGGGGCCAAGAAAATAAGGTTTATGCATGTAT 

GATGGTTTTCTTCTTGAGCAACATGATTGAGAACCAGTGTATGTCAACAGGTGCAT 

TTGAGATAACTTTAAATGATGTACCTGTGTGGTCTAAGCTGGAATCTGGTCACCTT 

CCATCCATGCAACAACTTGTTCAAATTCTTGACAATGAAATGAAACTCAATGTGCA 

TATGGGATTCAATCCCCACCATCGATCATAGCACCCCCTATCAGCACTGNAAACTC 
TTTTGCATTAAGGGATCATTGC 

Sequence ID 474 

GGCAGCGCGGGGAGCCCGTCGGCGCCGGCGGGCGGGCCGGTTTCGAAGTTGATGCA 
ATCGGTTTAAACATGGCTGAACGCGTGTGTACACGGGACTGACGCAACCCACGTGT 
AACTGTCAGCCGGGCCCTGAGTAATCGCTTAAAGATGTTCCTACGGGCTTGTTGCT 
GTTGATGTTTTGTTTTGTTTTGTTTTTTGGTCTTTTTTTGTATTATAAAAAATAAT 
CTATTTCTATGAGAAAAGAGGCGTCTGTATATTTTGGGAATCTTTTCCGTTTCAAG 
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A 

Sequence ID 475 

5 CATAATAAAAAACAATCAACAAACAGGGAATGGAAAGAAACTTCCTCAGCATGGTG 
AAGGCCACATATGAAAATCCCACAGCTAACATCATACTCAATGATGAAAGACTGAA 
AGCTTTTCTCCTGAGATCAGGAACAAGACAAAGATGTCACCTTTTGTCACTTCTAT 
TCAACTCATTATTGGAAGTTTTTGCCAGAGCAATTAGGTAAG 

10 Sequence ID - 476 nt: 476 

CAGAATCTTTTCATAGGCTGAATGTTGCTCCACAATGTGTCCTTTGACTATCTCTG 
GCTAATTATTATTTTAATCTCTTCTCAGCTTTTCCAAGAACATAACGTTAACCAAA 
GATCTTAGGCCATTCACAACTCTTTTGTAAAAATTAATGTGGATGTGAAACGAGGC 
AACAAATCCTGAAGTAGAAAGTTATTCCTGGCCAGGCACGGTGGCTCACGCCTGTA 

- 1 5 ATCCTGGCACTTTGGGAGGCCGAGGTGGGTGGATCATGAGGACAGGAGATCGAGAC 
CATCCTGGCCAACATGATGAAACCCCATCTCTACTAAAATACAAAAAATTAGCTGG 
GCATGGTGACGCGTGCCTGTAGTCCCAGTTACTCGGGAGGCTGAGGCAGGGGAATT 
GCTTGAACCTCGGAGGTGGGAGGTTGCAGTGTGCCGAGATCACGCTACTGCACTCC 
AGCCTGGCAACAGAGCAAGACTCCATCT 

20 

Sequence ID 477 

AAACAGAAAGTTTCTTCTAAAGGCATGATTCAGTTAAGTCATTCTTAAGTGTTAAA 
AAATTGTGAAAAATGTGCCTGTAATCCCAACACTTTGGGAGGCCGAGGCAGGCAGA 
TCACGAGGTCAGGAGATCAAGACCATCCTGGCTAACAAGGTGAAACCCCGTCTCTA 

2 5 CGAAAAATACCAAAAACATTAGCCGGGCGTGGTTGTGGGCGCCTGTAGTCCCAGCT 

ACTTGAGAGGCTGAGGCAGGAGAATG 

Sequence ID 4 78 

TTCTTGGGATATTGATGACTACTGTCTGAGAGGTGCTGTGGGGAGATTTTCAGGAT 

3 0 TGTGTGGTCTTTGAGGGGGGTGTTTTTTTAAGACAACATTGACCACTGTCCACTGT 

CCACATGATCATTGTAAAATTGCAATGCCGCATGCTAGTTGGTTACATAAGACATA 
ATTCCAGTGATTGAAGGTGGTTACACTGTATGGTGGTGTGTTCAAGATGGCACTGG 
CATCTTTGAGCAGAGCCTGGCTATGCAGCATCATTTGAGTTTTTTAAACACCCTAN 
AGGTCTGGTTGTTGTTGCTGTTGTCCTTTCCTGTGAAAGTCACAANANAAGTTACA 
35 GTCCAGGTGAACCTGGAGTTTATAGGTTGGTTTTGTTTCTGNTATATATATATATA 



TCACCTGCATGCTATTTCTAGTGAGTGCTAAATACAGTATGGTCCAATGACAATAA 
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CAGCCCATGGTACTGCCAG 
Sequence ID 479 

CATCAGTCTGTTATCCATGCTGACTTTCCGAAGACTTGCAGCTACTGCATTGATAT 
CTTTCCTGCCAATAAGCAAAGTGTTGAACACTTCACAAAATATTTTACTGAGGCAG 
GCTTGAAAGAGCTTTCAGAATATGTTCGGAATCAGCAAACCATCGGAGCTCGTAAG 
GAGCTCCAGAAAGAACTTCAAGAACAGATGTCCCGTGGTGATCCATTTAAGGATAT 
AATTTTATATGTCAAGGAGGAGATGAAAAAAAACAACATCCCAGAGCCAGTTGTCA 
TCGGAATAGTCTGGTCAAGTGTAATGAGCACTGTGGAATGGAACAAAAAAGAGGAG 
CTTGTAGCAGAGCAAGCCATCAAGCACTTGAAGCAATACAGCCCTCTACTTGCTGC 
CTTTACTACTCAAGGTCAGTCTGAGCTGACTCTGTTACTGAAGATTAGGGAGTATT 
GCTATGACAACATTCATTTCATGAAAGCCTTCCANAAAA 

Sequence ID 481 

CACACTTTCATGATAAAAACAGAACCTAGGAATGAAAAGAAATTATAGCAACATAA 
TAAAGACCATATATGAGAAGCCCACAGCTAACATACTGTATGGTGAAAAACTGAAA 
GCTCTTCCTCTAAGATCAGGAACAAGGCAAGGATGCCCATTCTTGCCACTTCTATC 
GAACGTAGTACTGGAAGCCCTAGCCAGAACAACTAGGCAATAGAAAGAAATTAAAG 
GCATCCATNTCAGAAAGGAAGAANCAAAATGCTGTCTGTTTAANATGACA 

Sequence ID 482 

TTTCTATANAAAAAAATTTTTTAAAATAATTGTAAAGTTAGATTTAAAATTGTAAA 

ATATAAAATCACAAAGGAATGTACCCAATAAAATGTAAATGCNCCATAAAAAAAAA 
AAAAAAAAAAAAAAAAAA 

Sequence ID 4 83 

CGNTAACGTGCAATCCGCCGCACGCCAGCAAACTGGACAAACTCCGGGATCTCATC 

GAAGCGATTGAGCACCAGTACCAGAGTAATACCGGACTGATGTAACGAGGCGAGTC 

GCTCATCCAGCTTGCTGACGTGAGGCAACATCCAGGCCATCGAACGGNTCATCAAG 

AATCAACAAGTCAGGCTCCGACATCAGCGCCTGACACAGCAGGGTTTTTCGCGTCT 

CGCCAGTGGAAAGGTATTTAAAGCGTCNGTCGAGGAGGGCGGTAATACCGAACTGC 

TGCGCCAGTTGCATGCAACGCGGTGCATCCTTTACTTCATCCTGAATGATCTCAGC 

CGTAGTGCGTCCGGTGCCATCTTCGCCAGGGCCGAGCATATCGGTGTTATTCCGCT 

GCCATTCGTCGCTGACGAGTTTTTGCAATTGCTCGAAGGAGAGACGAGTGATGTGG 

GAAAACTGGCTTTGCCGTTCACCTTTCAAAAGCGGGAAGTTCCCCCGCCAGCGCGC 
GGGCCAGGGCCCGAT 
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Sequence ID 484 

TTTTTTTTTTTTATTCTATTAAAAAATGTTNOTGAAAAAAGATACTTAAATTTTAA 
ApATAACTNAATTCCTAANGATTTAAAATAATCCAAGCAGAGATGAAAGANCAAAT 
GCAAATGCNTAAAAAGACCCCANAGCATTGTTAGCAAAAAGCAAATATAGTTAGCC 
AAGCATATATATNTCATAAAAGCAATAANAAGGCNTAAAGCAAGTTTGGGGAGAGC 
TTATTTAAAACTTGTAAAAATCATTTGAATTTTTAAAAGTTTTCAAAC 

Sequence ID - 485 n t : 551 

TAGAAAGATAGAGACAGAGTGATTTGCAAAATAATGGAGGATCATATTTATATATG 
AATTTTCACTTATTTGAACTTTCAGATATCANCTTNAAAANCTTTGGTTTAAGTAA 
AGTNTNTTAATGAGACTCCTTGGATGAAAGTAACCAAAACCAGTAAAAATAAGGTA 
ATAAGGATGTAATAGTTTCTTATGGACACTCAACAGCTAGAATGCAGTTAGTCTCA 
GAAAAGAATTAGAACAAATAACTGGAAGGCCATCAGGAGTCCAAAACCATCACTCT 
TTTATATTTTATATTTTATTTTTCTCTCTTCANATGAGCATTCTCTTTCTATGTCC 
ATATGGTANAAGGCGGCAGCTCCATAGATTATGGCTTCAGATGTTACAGTTCCGCT 
NAATGCAGGGACAGACTTGCTATCTTTCAGTCCCCTTACATATCCTGGGGAGAGAG 
CAAATGATTGACTGGCTTGAGTCAGGTGCCCGTTCCCTTTCCAATCT 

Sequence ID - 487 nt:224 

GTTTGNTTGTGACCATCTGTACTTGTAATTTCTTTACNTTCATTGGTATGAAAAAT 
ATGTTCTTAGAAGGANGAAAAAGAATTCAGNTTTGCTTTGTATACTAAATTAAATG 
CTGTAATTTTGATAAAATGAAAAATCTGCTTTATTTGCAACAATTGGTTTCTTCCT 
TGACGTCAGCCTCACTCTTGGACTTTGGTATTCAGCCNGNCACCCCTGGG7^ATTCC 

Sequence ID - 488 nt: 349 

GTGCCTCCCTGTGTGAGTAGCCTAAGGTGCATTGAAAAAGACTGGGATGTGTTTTA 

TTTTTTTGTATTAGATAGCATTAACCTTACTGTTGAAGTATTTTTGGTGGAGTATT 

AGTGACAAGCCATTGAGTCTTAAGCCTTACGGCTTCCTATAAAATCACTAATTTCG 

TGTGTGTTTGTGTGTAGGTTACGTTATATATAGGATTCGTGTTCGCCGTGGTGGCC 

GAAAACGCCCAGTTCCTAAGGGTGCAACTTACGGGAAGCCTGTCCATCATGGTGTT 

AACCAGCTAAAGTTTGCTCGAAGCCTTCAGTCCGTTGCAGAGGANCGAGCTGGACN 

CCCTGGGGGGCTC 

Sequence ID 48 9 

TTAACAGCTGCATAGAGTTTTAAAAGTACATTATATTTTGTCAGACT^AGTAAAATA 
TCTGTTTTTCACGCAAAAAAAGCCATGAAATACGTAATTTTTTAAAGACAAAAAAT 
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CATCTTTTGAGTTTGCTCTTTGGTTTTTCTTCATTCCTTTTGAGGATTGGGAAAAC 

AGAAAGATTCTTTGATTTGGGTAATGAAGAGGTAATTTGGGACAGTGTGGTGGTAC 

CAGGAAGAAAGAGGATTGGAAAGGCCAGTACTGTTTTAGTTGCTCGGCACTGTTGG 

TTTTGTTTTAATGTGGTTGCCCTGTCCACTACATGGTTCTATCAGTAGTGTAATCC 

ATTTTCAATGTAAAGCTCTTTTAGTTTTTGTCATAGACATAAATTAATATTTTGAG 

AGGCATCCCTCACCTGTTCATTTCTTCTGTGTTGAAATGAAGTACTTAAAATTACC 

GTTATACATGAACTTTGTGGACTGTAAGATTTGTTATATATGTTCAAATGCCTTTT 

AGCTGGCTTTTTAATTAATATGCCTGTTTTGAGTGCTTAATACAATGTAATGNGGA 

TTGTAAATCATACCTATTTTAAATCATTCCTTCCTGTATATTTGNACTCAGAGAGC 
CTTATTTTATTCTTCCAGC 

Sequence ID - 491 n t : 382 

TTTTCTTAGAACTTTATTTTTTCTGGCCAGGCGCAGTGGCTCACACCTGTAATCCC 
AGCACTTTGGGAGGCCAAGGCAGGTCGATCACCTGAGGTCAGGAGCTCAAGACCAG 
CCTGGCCAACATGGTGAAACCCTGTCTCTACTAAAAATACAAAAATTAGCTGGGCG 
TGGTGGCGCATGCCTGTAATCCCANCTACTCAGGAGGCTGAGGCAGGAGAATTGTT 
TGAACCCGGGAGGCGGAGGTTGCANTGAGCCGAGATTGCGCCACTGCACTCCAGCC 
TGGGCAACAGAGCGAAACTCCATCTCAAAAAAAAAAAAAAAAAACAACCTTTATTT 
TTTCTGATTTTAAAAGTAATAACTAGTTTGTAGAAACATTAAAAGT 

Sequence ID 492 

ACCCTAAACATAACTTAAAATTTGTTNGGAATTTGAAAGTACAGAATTTTCCTGTA 
ATTGAGACTNTTTAAACTTTTGTGGTTGGAGAAGGTATTCTATTTTTTGAAAATAT 
CTGTAAGTTTTATCTAAATAGTAAACTCTAAGTATTCTTCCCCTTTACTTACAGCC 
ACCCTGGGAATCTGAGACTAGAGAAAATAAAGTTTGTCTCTTGTTCTAAGGAGGGT 
CTGGTTTAGAAATCTGATTTAGACATAGAAAAATTGCAAGAAGCTTGAGGTGATTG 
GAAGATACGATTTTGTTATCAAAGNATGTTTCTGTTTTATAGATTTTATTCATCTA 
CAACTCCTTATTAATATATTTAAGAAGTCATTAACCCACCATTGATTACTTGATAT 

GTTTAGGATTTTTTTTTAAATTCTAAGAGTTTCTGTCATTTGGGGACAATCAGAA 
Sequence ID 4 93 

TGGGAATCATAATTNGTTAACTGAAGCTNATAAGATGAGAGCATTCANAGAGAAAA 
GAACGGAAAGATTGAATATCAGTTTCCCTTCTTTAAAAAAATTGTGGATATGTGAT 
CTAGCTTCTTGAGCATCACAGTGACTGATTGGCTCGTGGTAATTGATCGCTATGCT 
GACAATCTTATCTCCACCTATGTCATTCAATTTTCTAAGAGGCAAAATCCTTAATC 
AGGAGGAGAGTTTAGCTCTAGCTAAATTTCCCTTGTCCAGCATGCTCCTGCTCCCC 
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CAACTTGTGGAAACAGCTAAAGGATTGGACTAGGAGCANAAGTTTGGAATGGTTAA 

AATGTAGCAACATGTGTTTCCTGAAACAAAATTCCACTATAATAAAAAAAGCATTT 

GAATGCTCCCTTGTAATTCTGTTGGAGCTTGTTGCCTTTTTTATGACACAACCATA 

ATCAGTGATAGACAGTAGCATAAAGAAGCAAGAGCAAAGCAATTAAGTAATAATAG 

CACTACAAAAATGTGTGCTGTACTTACCAAACACGACATTTATGAATTATTANATA 
GGAATAAGGGGATGGT 

Sequence ID 494 

GACCCAGCCATCTAAATAAGTTRTACATGTTGCGTATTTTTTTGTTAGGGACTTAT 

CTTCCGAAGAGGAAAGGTTTATGAAACCTAAAGTAACAATGATAGCTTGGAATCAA 

AATGATAGCATTGTTGGCACAGCTGTGAATGATCATGTCCTCAAAGTGTGGAATTC 

TTACACTGGACAACTGCTTCATAACTTAATGGGACATGCTGATGAAGTATTTGTTC 

TGGAGACACATCCCTTTGATTCCAGAATTATGTTATCTGCAGGACATGATGGCAGC 

ATATTTATATGGGATATTACAAAAGGTACCAAGATGAAACATTATTTTAATATGGT 

AAGTGAAGTGAGATGTACCTTGATACATGCTTGATAATTTGTTTAGAGTATTTGGG 

TTATGCGGCTTACCCAGAAATTGATCTGCTTGTTTTGGCAGTTTGTTTTTACAAAT 

CAACATATTCAAAGCCTGCTAAATATTAGACAGCTACATGTATATACGTACATACA 
TGAA 

Sequence ID 4 95 
TTTC 

Sequence ID 4 96 

CTCGCTGGCGGGAGGCCACGGGCTTTCCACAGCGCGGGGGAACGGGAGGCTGCAGG 
ATGGTCAAGCTGACGGCGGAGCTGATCGAGCAGGCGGCGCAGTACACCAACGCGGT 
GCGCGACCGGGAGCTGGACCTCCGGGGGTGATCTGGACCCTCTGGCATCTCTCAAA 
TCGCTGACTTACCTAAGTATCCTAAGAAATCCGGTAACCAATAAGAAGCATTACAG 
ATTGTATGTGATTTATAAAGTTCCGCAAGTCATAGTACTGGATTTCCAGAAAGTGA 
AACTAAAATTTTAATCCAGGTGCTGGTTTGCCAACTGACAAAAAGAAAGGTGGGCC 
ATCTCCAGGGGATGTAAAAGCAATCAAGAATGCCATAGCAAATGCTTNAACTCTGG 
CTGAAGTGGANAGGCTGAANGGGTTGCTGCAGTCTGGTC 

Sequence ID 4 97 

GAAGACCTCACATCTGAGAGCTCATCTGCGTTGGCATTCTGGAGAACGCCCTTTTG 
TTTGTAACTGGATGTACTGTGGTAAAAGATTTACTCGAAGTGATGAATTACAGAGG 
CACAGAAGAACACATACAGGTGAGAAGAAATTTGTTTGTCCAGAATGTTCAAAACG 
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CTTTATGANAAGTGACCACCTTGCCAAACATATTAAAACACACCAGAATAAJy^AAG 
GTATTCACTCTANCAGTACAGTGCTGGCATCTGTGGAAGCTGCGCGAGATGATACT 
TTGATTACTGCAGGAGGAACAACGCTTATCCTTGCAAATATTCAACAAGGTTCTGT 
TTCAGGGATAGGAACTGTTAATACTTCCGCCACCAGCAATCAAGATATCCTTACCA 
5 ' ACACTGAAATACCTTTACAGCTTGTCACAGTTTCTGGAAATGAGACAATGGGAGTA 
AATATTACACAAATACTTATTCATTGNGGTTATTTTTATACAGTAGTGAGAAGAAT 
ATTGTTCCTAAGTTCTTAGATATCTTTTTTTGGATGTGCAAAAATTTTTGGATTGA 
CAGTAACTTGGGTATACATGACACTGAAATGCCTTACTTTGGATGA 

10 Sequence ID 499 

TGCCTGCGGGCGAGGACCTCGCCCAGCCCATGTTCATCCAGTCAGCCAACCAGCCC 
TCCGANGGGCAGGCCCCCCAGGTGACCGGCGACTGAGGGCCTGAGCTGGCAAGGCC 
AAGGACACCCAACACAATTTTTGCCATACAGCCCCAGGCAATGGGCACAGCCTTCC 
TCCCCANAGGACCCGGCCGACCTCAGCGCCTCCTGCAGGCTAGGACACTGGTGCAC 

1 5 TACACCCCATGCCTGGGGGCCGAGATTCTCCAGCAGAAAGATGCAATATTTTTTGT 
TTCCTTTTTTTCCATTTTTTTCTCTAAGGAATCAATATTTCAATATGTTGAGTGTG 
TGTCCAATGCTATGAAATTAAAATATTAAATAACATATTTATGGCATTTTCTTGAA 
GAGTGTGGTTGAAGAAATATTTCTCCTTTTGTTTTTCTTTTTTTTTTGNTTGNTAC 
TGCCACTTCTTTTTAGGAGCAAATCTCCCCAGGGGTGTACGGNATTTCTTGACTCT 

2 0 GGGAACAGCTGCTACCCCCAAGACTTGCCACGTTGTTCTGCCCTCAAATGGAATTA 
AGTG 

Sequence ID - 500 nt : 390 

GGAATATGGTCAGGATCTTCTCCATACTGTCTTCAAGAATGGCAAGGTGACAAAAA 

2 5 GCTATTCATTTGATGAAATAAGAAAAAATGCACAGCTGAATATTGAACTGGAAGCA 

GCACATCATTAGGCTTTATGACTGGGTGTGTGTTGTGTGTATGTAATACATAATGT 
TTATTGTACANATGTGTGGGGTTTGTGTTTTATGATACATTACAGCCAAATTATTT 

CTCAAATTAAGAAATGCATTTAACCATGTAAAANATGANTGCTAAAGTCAGCTTTT 

3 0 TAGGGCCCTTTGCCAATAGGTANTCATTCAATCTGGTATTGATCTTTTCACAAA 

Sequence ID 5 02 

ACCCGCCATCTTCCAGTAATTCGCCAAAATGACGAACACAAAGGGAAAGAGGAGAG 
GCACCCGATATATGTTCTCTAGGCCTTTTANAAAACATGGAGTTGTTCCTTTGGCC 
3 5 ACATATATGCGAATCTATAAGAAAGGTGATATTGTAGACATCAAGGGAATGGGTAC 
TGTTCAAAAAGGAATGCCCCACAAGTGTTACCATGGCAAAACTGGAAGAGTCTACA 
ATGTTACCCAGCATGCTGTTGGCATTGTTGTAAACAAACAAGTTAAGGGCAAGATT 
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CTTGCCAAGAGAATTAATGTGCGTATTGAGCACATTAAGCACTCTAAGAGCCGAGA 
TAGCTTCCTGAAACGTGTGAAGGAAAATGATCAGAAAAAGAAAGAAGCCAAAGAGA 
AAGGTACCTGGGTTCAACTAAAGCGCCAGCCTGCTCCACCCAGAGAAGCACACTTT 
GTGAGAACCAATGGGAAGGAGCCTGAGCTGCTGGAACCTATTCCCTATGAATTCAT 
GGCATAATAGGTGTTAAAAAAAAAAAATAAAGGACCTCTGGG 

Sequence ID - 503 n t : 109 

ACATTTTCCGGNCCTTTTGCCATACACAGTTACAGAGATCAGTCAAATCCATACCA 
CCACTGAGATCTCATTTATTGCCACAGATGCACAAAATAAATAACCCAAAATC 

Sequence ID - 504 n t : 374 

CCAGCAACGACCCATACCTCAGACCCGACGGCCCGGAGCGGAGCGCGCCCTGCCCT 
GGCGCAGCCAGAGCCGCCGGGTGCCCGCTGCAGTTTCTTGGGACATAGGAGCGCAA 
AGAAGCTACAGCCTGGACTTACCACCACTAAACTGCGAGAGAAGCTAAACGTGTTT 
ATTTTCCCTTAAATTATTTTTGTAATGGTAGCTTTTTCTACATCTTACTCCTGTTG 
ATGCAGCTAAGGTACATTTGTAAAAAGAAAAT^AAACCAGACTTTTCANACAAACCC 



TTACAGTATTTGTAAGAATAAAGCANCATTTGAAATCG 
Sequence ID 505 



GTACAGGAGGTAZ^ATTGGATACCCCATCTAAGGGGATCTGTGAGACCAGGTAGTTA 

TTTGGAATGAAAGAGTAAGATATTAAACCAGCCAGCATGTCAACAGGTGGGTGATA 

GTCTTGTTCTCACAGAGAACAGATGGCCATCATCTTAAAACAACATTTATGTTAAC 

CAGCAGATAAGGGACTCCTGCATTGTCAGTGGACTTTGAGCCTGAGTTTTTCTACT 

TGCATAGGTGAAAGTGGACTGCAATGCTAGTATAAATGCCGTATGATGACTAGTAC 

CCCTTAGGGAGOTCCAGTTTGCCTTCCTGGGGAACC 

CCTGAGGACAGCCCGACTTCT 

Sequence ID 506 

GTTACTGTGAGCCTGTCAGTAGTGGGTACCAATCTTTTGTGACATATTGTCATGCT 
GAGGTGNGACACCTGCTGCACTCATCTGATGTAAAACCATCCCANAGCTGGCGAGA 
GGATGGAGCTGGGTGGAAACTGCTTTGCACTATCGTTTGCTTGGTGTTTGTTTTTA 
ACGCACAACTTGCTTGTACAGTAAACTGTCTTCTGTACTATTTAACTGTAAAATGG 
AATTTTGACTGATTTGTTACAATAATATAACTCTGAGATGTGTGAAAAAAAAAAAA 
AAAAAAAAAAAAA 
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Sequence ID - 507 nt . 521 

CTGCGGTGGAGCCGCCACCAAAATGCAGATTTTCGTGAAAACCCTTACGGGGAAGA 

CCATCACCCTCGAGGTTGAACCCTCGGATACGATAGAAAATGTAAAGGCCAAGATC 

CAGGATAAGGAAGGAATTCCTCCTGATCAGCAGAGACTGATCTTTGCTGGCAAGCA 

GCTGGAAGATGGACGTACTTTGTCTGACTACAATATTCAAAAGGAGTCTACTCTTC 

ATCTTGTGTTGAGACTTCGTGGTGGTGCTAAGAAAAGGAAGAAGAAGTCTTACACC 

ACTCCCAAGAAGAATAAGCACAAGAGAAAGAAGGTTAAGCTGGCTGTCCTGAAATA 

TTATAAGGTGGATGAGAATGGCAAAATTAGTCGCCTTCGTCGAGAGTGCCCTTCTG 

ATGAATGTGGTGCTGGGGTGTTTATGGCAAGTCACTTTGACAGACATTATTGTGGC 

AAATGTTGTCTGACTTACTGTTTCAACAAACCAGAAGACAAGTAACTGTATGAGTT 
AATAAAAGACATGAACT 

Sequence ID 508 

AAGCTCATGATTTTAAATGTATTTTTCTAATAAACTATACTCCCATTTAAAAATCA 

CCAATACCTTAATGTTTCAATTATATAAGCTAATTAAAAATAAAGGCTGGGCGTGG 

TGGCTCACTTTGGAAGACCGAGGCAGGCAGATCACCTGAGGTCAGGAGTTCGAGAC 

CAGCCTGCCCAACATGGAGAAACCCCATCTCTACTAAAAATACAAAATTAGCCAGG 

CATGGTGGCACATGCCCGTAATCCCAGCTACTGGGGAAGCTGAGGCAGGAGAATCA 

CTTGAACCTGGGAGGCAGGGGCTGCAGTGAGCCGAGATCATGCCATTGCACTCCAG 

TCTGGGCAACT^TAGTGGAACTCCATCTCAAAAATAATAAAAAAAATAAAATAAAA 

ATAAAATTCAAACCTAAAATAGATGCTCTACTTCAGGAGTGGGCAAATTAATCACC 
TGCATCCTTTTTTTGGGCTTTC 

Sequence ID - 509 nt . 575 

TTTTTTTCTAAATGGNGATTACTAATATATGTGGAGACTATTAATCTCTTTTCTGT 

TGCCATTAGTTCATTTTTCCCCAAAAGCCAATACATGTTCATTACAAAAATGAATT 

ATAAAATATAAGTTAAAAGAAAAACATAAAACCCTACAATCTTACCCACCCAGACA 

ACTACTATTAATACCTTAGTATTAACATATACACATCATGTATATGTATAAATTTA 

TCTTAAACAAAAATAAAATTATTCTTTACATATTGTTTTAAAACCTATTTATCTGG 

CCAGGTGCCGTGGCTCACGCTTGTAATCCCAGCACTTTGGGAGGCTGAGGCACGTG 

GATCACCTGAGGTCAGGAATTCGAGACCAGCCCAGCCAACATGGTGAAACCCTGTC 

TCTAATGGTTTAAATACCAAAAAATTAGCTGGGCATGGTGGCACATGCCTGTAATA 

TCAGCTAACATGGGAGGCTGAGGCAGGAGAATCACTTGAACCANGGAGGGGGAGGT 

TGCAGTGAGCCGAAATCACACCACTTCACTGCAGCCTGGGCAACAAAGCAAGACTG 
TCTCAAAAAGAAAAA 



Sequence ID 510 
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CACTGTCATTCCCAGGAGGCTTTGGAGTCAGAACTGGATTCAAATTCTGACTNTAT 
GTTGTGTGACTTGGGCCAATAGCTTCTTTNTGTGCCTCAGTTTCTTTAGCTGTAAA 
TANACGGGTAGGTCACCCCTTACCCCATAGGTTATGGGGAAAGTTACAGAAAATGG 
TCAGCTGGGCNCAGTGGCTCAAGCCTGTGGTCCCAGCNCCTTGGGAGGCCAAGGTG 
AGCAGATTGCTTGAGCCCAGGAGTTTGACACCAGTNTGGCAACGTGACGAAACCCT 
ATCNCTGTGAAAAATACAAAAAATTAGCCAGGCATGGTGGTGTGTGTCTGTGGTTC 
CAGCTGCTTGAGAGTTTGAAGTGGGAGGATCACCTGAGCCCAGAAGGTCGAGGCTG 
CAGTGAGCTGTGATCGCGTCACTGCACTCCAGCCTGGC - GACAGAGTGAGA- CCCC 
T - TTTGAAAAAAAAAAAAAAAAAAT 

Sequence ID 512 

GTGAGCGGTGGTGGTTTATTCTTCCGTGGAGTTAAGGGCTCCGTGGACATCTCAGG 

TCTTCAGGGTCTTCCATCTGGAACTATATAAAGTTCAGAAAACATGTCTCGAAGAT 

ATGACTCCAGGACCACTATATTTTCTCCAGAAGGTCGCTTATACCAAGTTGAATAT 

GCCATGGAAGCTATTGGACATGCAGGCACCTGTTTGGGAATTTTAGCAAATGATGG 

TGTTTTGCTTGCAGCAGAGAGACNCAACATCCACAAGCTTCTTGATGAAGTCTTTT 

TTTCTGAAAAAATTTATAAACTCAATGAGGACATGGCTTGCAGTGTGGCAGGCATA 

ACTTCTGATGCTAATGTTCTGACTAATGAACTAAGGCTCATTGCTCAAAGGTATTT 

ATTACAGTATCAGGAGCCAATACCTTGTGAGCAGTTGGTTACAGCGCTGTGTGATA 

TCAAACAAGCTTATACACAATTTGGAGGAAAA.CGTCCCTTTGGTGTTTCATTGCTG 

TACATTGGCTGGGATAAGCACTATGGCTTTCAGCTCTATCAGAGTGACCCTAGTGG 

AAATTCGGGGGATGGGAAGGCCACATGCATTGGAAATAATANCGCTGCAGCTGTGT 
CAATGTTGAAACAAG 

Sequence ID 513 

TTTTTTTTTTATAAACTCCAATC^TTTCCAGAGCTACTTAGCTCAGCIATCTTTTTT 
TTCCACGCTCTTAAGTTGTGTTTATACATTTTTGATACAGTTAGATTGTTTTTGTC 
ACATTCTTCATTCTATCCTGGGATCCCCCAACCACCTAAGTGGATTTTTTGATAAT 
TTGCATGCTTTAAGGATAACTCTTCATTCTGNAAAGGGCTATGGGTTTTGGCAAAT 
GCAGAGTCATGTATCCAAGATTACAATATCGCACAGAAGAGTTTCATCACTATATA 
AAACTCACCAGTCTTCCTCCTATTCAACCATCTCCATGCCTTCTTCCCAGCCCTAA 
CTCCTTAAAACCACTCATATCTTTACTATTGCTATAGTATTGCCTCTTCCACCATG 
TCATATAAATGGAAACATACAGTATTAGTCTTCTCAAACTAGTTTCTTTTACCTAA 
C^^CATGGATTTAAGATTCATAGTGTCTTTTAATGACTTGATAGATTATTTCTTTG 
TAGCTGAATAATATTGCATCTTATAGATGTAACCGTTTGTATATCCATATTTTCTC 
ACAGCCTATGACTTGNCTTTTGATTCTCTGAACAGGCCATTCACAAAGCAGAAGTT 
TTAATTTTTATAAAGCTAATGNATCAACTT 
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Sequence ID 515 

CCTGGATGACAGCATATCTGTTTATAGCTCAGTTTACTGAATACTTTAAGCCCACT 
GTTGAAACCTGCT 

Sequence ID - 518 n t: 502 

GATGCATGTCCAGCATAGGCAGGATTGCTCGGTGGTGAGAAGGTTAGGTCCGGCTC 
AGACTGAATAAGAAGAGATAAAATTTGCCTTAAAACTTACCTGGCAGTGGCTTTGC 
TGCACGGTCTGAAACCACCTGTTCCCACCCTCTTGACCGAAATTTCCTTGTGACAC 
AGAGAAGGGCAAAGGTCTGAGCCCAGAGTTGACGGAGGGAGTATTTCAGGGTTCAC 
TTCAGGGGCTCCC^AAGCGACAAGATCGTTAGGGAGAGAGGCCCAGGGTGGGGACT 
GGGAATTTAAGGAGAGCTGGGAACGGATCCCTTAGGTTCAGGAAGCTTCTGTGGAA 
GCTGCGAGGATGGCTTGGGCCGAAGGGTTGCTCTGCCCGCCGCGCTAGCTGTGAGC 
TGAGCAAAGCCCTGGGCTCACAGCACCCCAAAAGCCTGTGGCTTCAGTCCTGCGTC 
TGCACCACACATTCAAAAGGATCGTTTTGTTTTGTTTTTAAAGAAAGGTGANAT 

Sequence ID 519 

CTGCGATNGAGTTTTGAGAGGAAGGANTAAAGTNCTCATCTCNGACGGTGAGAAAG 
ATCATNACTAAGGAAACGCAGGGTTGGAAGCAGTGCTGANTGTCCAGTTGAGTTTC 
ATGANCAAACATTTGCTGTGGGACCAGTTTTCATGGNGGTTTGTCATTTTGTCCAG 
CTGCCTGGAGCTGCTTGGTTGAAGGCACAGAATAATCAGGATTAATTGTTNAACTT 
GTATGAATTTCTTTATTTTAAAATAGGAATAATATCTGCCTTGGGAGCAAGTTGTA 
AGAGTTAACTGAAAGCTTNAGGAAAAACTTTCCCTTGCTATTTAAGTAGGGCTTTA 
CAAGTTACAATTCTATCACAGTTTTAAGATTATAAAC 

Sequence ID 521 

GCGGCGCANCTGCGGATCCANAAGGNCATAAACGANCNGAACCTGCCCAANNCGTG 
TGATATCACCTTCTNAGATCC^GAOSrACCTCCTCAACTTCAAGCTGGTCATCTGTC 
CTGATGAGGGCTTCNACAAGAGTGGGAAGTTTGTCTCAAAAAA 

Sequence ID - 523 nt: 585 

GATTTACTGTGGGAATTTGCTCATGCAATTATGGAAACCTAGAAGTCCCATAATAT 

GCCATCTTCAAGCTGGAATCCCAGGAAAGCAGGTGGTGTAATTCTGAGATTGAAGT 

CTTGAGAACCGGGGGAGTCAATGGTGTAACTCCCAATCTAGGGCTTAAGGCCCAAG 

GACCAGGGCTGCTGGTGTGCAGATGCAAATCCTGGAGTTCAAAGGATTGAGAACCA 

GGAGCTCTGGTGTCTGAGGGCAGTAGAAGATGGATGTTCCAGCTCAAGAAGGGAAA 

GTAAGAATCCGTCCTTCCTCCACTTTTTTGTTCTATTCAGATGAGCCCTCAATGGA 

CTGAACGATGCTCACCCACACTGTGAGGGCTGGTCTTCTTTATTCAA.TCCAGTGAG 
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TTAAGTGCTGATCTCTTCTGGAAACACCTTCACAGACACACCCAGAAATAATGTTC 

TACCAGCCATGGGCCTGTTACTTAGCCCAGTCAAGTTGACACAGAAAATTAGCTAT 

CACAACATCTGTGTGTGTATATACATATGTATTTGCATGTGTGTGTATATATGGNG 
TATATATATTCATGTGTGTGTATAT 

Sequence ID 524 

CTTTTGCCAGTAGGCCCCCTGAGTAGGTTCGTCTATCTTTTGGCATGACCCCAGAA 

GTCTTTGATAACTTCCTTGCTTTCTGATGTGACAAGACATCCAGGGCCAGATTGTC 

CATATCCTGCCCCGGATGCACGATGCACTGTTTCTCCAAGAATCCCTGTGTCCTTT 

GCTGATGATGCCATGATTTTAAGTTCTCTAATATAGTTTTATCTCTTTGTTTCAGA 

TAATGCTTTTGTGTTCTCACATGTCCTGCTCTCTCTCTCTCTCTCATTTTGGTGTT 

GATCAGTCTTTCCATAAGATTGTTTATTTCACTAGTCCTTCATTCTTCTTTTTTCT 

AAATTTACTCTTCTTGACTAGTATCCTGTCACTTCTGAGGACTCATATTTTTGCAA 

CTTGAAAATTATTCTTATTTATTTAAGTATATGTTNCTGAAACTCTCATTAGACAC 
ATTTTG 

Sequence ID 525 

GTTAAAAAAAGTAAAAGGAACTCGGCAAATCTTACCCCGCCTGTTTACCAAAAACA 
TCACCTGGTAGCATCACCAGTATTAGAGGCACCGCCTGCCCAGTGACACATGTTTA 
ACGGCCGCGGTACCCTAACCGTGCAAAGGTAGCATAATCACTTGTTCCTTAAATAG 
GGAGCTGTATGAATGGCTCCACNAGGGTTCANCTGTCTCTTACTTTTAACCAGTGA 
AATTGACCTGCCCGTGAAGAGGCGGGCATAACACAGCTGAAAAAAAAAAAAAAAAA 

AAAAAATTTT 

Sequence ID - 526 n t : si6 

CTTTTCATGGTCTCTTGTTCATTAATCATCTAAAATCCAAGCNCAGAGAATTCAAT 

TTTAGATGGTCTCCAGAGCAGAATTTGATGTATAATCTTAATTACAAATCATAGAT 

AATTAATATTGNTTACAAAATCANAATACGATTAGAGGTAGGGATCCTGCACACAC 

CCTATTTTCCTCCCCAGTGTTCTGACCGAGAGACTAATTAATAATTCAAGGAACTT 

ACAGTGAATGANAACCCATGGTTTTGCTTAATTATCAGAACAGCTAGATCTGAGAA 

CAGCTGTCTCCCACATGGATAGACACTTATTCCACCCATTTGCAGGTAGAATAGCT 

GGCAATAATAAGTCCTTCCCATTGGATATGTTGAAAGGTGCCTGCCATGGCATAGT 

TGCCACAAGAGAGGAAGAAATGGACACAAATGTAGGCTGTTTTCAGGGCANAGGGA 

AGGTGGGAGGAAACCAANTTGCTGGTTTTCACACACCCTCTGGGGAACACCCATGC 
ACCTATGANATG 
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Sequence ID 52 7 

GACAAAAGCTGAGAGAATTTTTTTCTTGAATATTTGCACTAAAA.GATAGGTTAAAA 

TTCTTCAGGCTGAAGAGAGCATACCAGGTC3GAGATTTGGATCTACAAAAAGGAAGG 

AAGATTTGGAAATGGATTTGGCACCATTGACTCAATTTCCAGAACAAGAAAGCAGG 

GACAGTTTTGGGAAGCTCAAGACACACTGCCCATGAGCAGCAATTTGGACCTCCTG 

CTGCATCCACTGTGCATCAAACACACACTGTACAGACAAAGACTCCCAGGAAAAGA 

AGTATAAACATGGACTAACACAGAGATGGGCAAACTACAGCCTGTGACCCAGCCAC 

CTGTTTATGTAGAATCCAAAGTAAGAATCTTTAACTTACACATAAACTT 

Sequence 529/ 660nt 

GACAGCAGAGC^CACAAGCTTNTAGGACAAGAGCCAGGAAGAAACCACCGGAAGGA 

ACCATCTCACTGTGTGTAAACATGACTTCCAAGCTGGCCGTGGCTCTCTTGGCAGC 

CTTCCTGATTTCTGCAGCTCTGTGTGAAGGTGCAGTTTTGCCAAGGAGTGCTAAAG 

AACTTAGATGTCAGTGCATAAAGACATACTCCAAACCTTTCCACCCCAAATTTATC 

AAAGAACTGAGAGTGATTGAGAGTGGACCACACTGCGCCAACACAGAAATTATTGT 

AAAGCTTTCTGATGGAAGANAGCTCTGTCTGGACCCCAAGGAAAACTGGGTGCANA 

GGGTTGTGGANAAGTTTTTGAAGAGGGCTGAGAATTCATAAAAAAATTCATTCTCT 

GTGGTATCCAAGAATCAGTGAAGATGCCAGTGAAACTTCAAGCAAATCTACTTCAA 

CACTTCATGTATTGTGTGGGTCTGTTGTAGGGTTGCCAGATGCAATACAAGATTCC 

TGGTTAAATTTGAATTTCAGTAAACAATGAATAGTTTTTCATTGTACCATGAAATA 

TCCAGAACATACTTATATGTAAAGTATTATTTATTTGAATCTACAAAAAACAACAA 

ATAATTTTTAGATATAAGGATTTTCCTGGATATTGCACGGGAGA 

Sequence ID 529 

GACAGCAGAGCACACAAGCTTNTAGGACAAGAGCCAGGAAGAAACCACCGGAAGGA 

ACCATCTCACTGTGTGTAAACATGACTTCCAAGCTGGCCGTGGCTCTCTTGGCAGC 

CTTCCTGATTTCTGCAGCTCTGTGTGAAGGTGCAGTTTTGCCAAGGAGTGCTAAAG 

AACTTAGATGTCAGTGCATAAAGACATACTCCAAACCTTTCCACCCCAAATTTATC 

AAAGAACTGAGAGTGATTGAGAGTGGACCACACTGCGCCAACACAGAAATTATTGT 

AAAGCTTTCTGATGGAAGANAGCTCTGTCTGGACCCCAAGGAAAACTGGGTGCANA 

GGGTTGTGGANAAGTTTTTGAAGAGGGCTGAGAATTCATAAAAAAATTCATTCTCT 

GTGGTATCCAAGAATCAGTGAAGATGCCAGTGAAACTTCAAGCAAATCTACTTCAA 

CACTTCATGTATTGTGTGGGTCTGTTGTAGGGTTGCCAGATGCAATACAAGATTCC 

TGGTTAAATTTGAATTTCAGTAAACAATGAATAGTTTTTCATTGT 

Sequence ID - 530 nt: 66Q 

GACAGCAGAGCACACAAGCTTNTAGGACAAGAGCCAGGAAGAAACCACCGGAAGGA- 
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ACCATCTCACTGTGTGTAAACATGACTTCCAAGCTGGCCGTGGCTCTCTTGGCAGC 

CTTCCTGATTTCTGCAGCTCTGTGTGAAGGTGCAGTTTTGCCAAGGAGTGCTAAAG 

AACTTAGATGTCAGTGCATAAAGACATACTCCAAACCTTTCCACCCCAAATTTATC 

AAAGAACTGAGAGTGATTGAGAGTGGACCACACTGCGCCAACACAGAAATTATTGT 

AAAGCTTTCTGATGGAAGANAGCTCTGTCTGGACCCCAAGGAAAACTGGGTGCANA 

GGGTTGTGGANAAGTTTTTGAAGAGGGCTGAGAATTCATAAAAAAATTCATTCTCT 

GTGGTATCCAAGAATCAGTGAAGATGCCAGTGAAACTTCAAGCAAATCTACTTCAA 

CACTTCATGTATTGTGTGGGTCTGTTGTAGGGTTGCCAGATGCAATACAAGATTCC 

TGGTTAAATTTGAATTTCAGTAAACAATGAATAGTTTTTCATTGTACCATGAAATA 

TCCAGAACATACI'TATATGTAAAGTATTATTTATTTGAATCTACAAAAAACAACAA 

ATAATTTTTAGATATAAGGATTTTCCTGGATATTGCACGGGAGA 

Sequence ID 532 

GAATTGTGATAGTTCAGCTTGAATGTCTCTTAGAGGGTGGGCTTTTGTTGATGAGG 

GAGGGGAAACTTTTTTTTTTTCTATAGACTTTTTTCANATAACATCTTCTGAGTCA 

TAACCAGCCTGGCAGTATGATGGCCTANATGCAGAGAAAACAGCTCCTTGGTGAAT 

TGATAAGTAAAGGCAGAAAAGATTATATGTCATACCTCCATTGGGGAATAAGCATA 

ACCCTGAGATTCTTACTACTGATGAGAACATTATCTGCATATGCCAAAAAATTTTA 

AGCAAATGAAAGCTACCAATTTAAAGTTACGGAATCTACCATTTTAAAGTTAATTG 

CTTGTCAAGCTATAACCACAAAAATAATGAATTGATGAGAAATACAATGAAGAGGC 

AATGTCCATCTCAAAATACTGCTTTTACAAAAGCAGAATAAAAGCGAAAAGAAATG 

AAAATGTTACACTACATTAATCCTGGAATAAAAGAAGCCGAAATAAATGAGAGATG 

AGTTGGGATCAAGTGGGATTGANGANGCTGTGCTGTGT 

Sequence ID 533 

CTTGAACCTCGGAGGCAGAGGTTGCAGTGAGCCGAGATCACGCCACTGCACTCCAG 
CCTCGGGGACAGAGCAAGACTCCATCTCAAAACACACACACACACACACACACACA 
CACACACACACACAAAACAGATATACACTGAACACAGCACAAGTGGGACATAAGAG 
ATTTAAAAGGGTTAGAGATGTAAAATGGATCTAGGAATGGAAACCATAAGGNGGGA 
TTTATCAACTGGATTCTGCANAATGCTGTTAAGGCCAGATGTTAGCAGGTGTTACA 
TAAAAAAGGGATACCATGAGCAAAAGTATTTGAACATGGGCAATGGTTGAAACAAG 
TTTAAACAGATTATNTTTATTACCAAATCTCTCAAACCTTTAATATGCTATAAACA 
TTGTGAAACAATAAAAAAACTTTCCAAAA 

Sequence ID 534 

GGGAAGGGAGCTATGAGTGTGTGTGTTGTGTATGGACTCACTCCCAGGTTCACCTG 
GCCACAGGTGCACCCTTCCCACACCCTTTACATTCCCCAGAGCCAAGGGAGTTTAA 
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GTTTGCAGTTAC^GGCCAGTTCTCCAGCTCTCCATCTTANAGAGACAGGTCACCTT 
GCAGGCCTGCTTGCAGGAAATGAATCCAGCAGCCAACTCGAATCCCCCTAGGGCTC 

aggcactgagggcctgggga^gtggagcatatgggtgggagacagatggagggS 
ccctatttacaactgagtcagcc^gccacWgggaatatacagatttaggtgc 

TAAACCGTTTATTTTCCACGGATGAGTCACAA.TCTGAAGAATCAAACTTCGATCCT 

GAAAATCTATATGTTTCAAAACCACTTGCCATCCTGTTAGATTGCCAGTTCCTGGG 
ACCAGGCCTCANACTGTGAAAGTA ^ 

Sequence ID 560 

GGCGGAGGTTGCAGTGAGCTGAGATGGCGCCATTGCTCTCCCAGCCTGGGTGACAA 

GAGCAAAACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAGCAATTTACTTAAA^ 

ATACAAACACAGAGACAAGTATTTTTGAGAAACAAATACCTTTTTCATTTTTTATA 

CCAATGTAACAATAATCCATTAAACACACCTTTACTAACTGTTTTCTAGGAGTCTG 

ATATGATGAGGAAATAGGTAAACCTTTAATAGCCAGTACTAAATTAGAGTGGCACA 

ACTTTCACTGGGAAAAAAGATGGGTATTTTACTTTTCTGTTTTAGAAAAGTGGCTT 

GACAACAGTATGCTTATGTCTTAGAGTTTGAAATTCAAGTTCTTGAACATTATTAA 

TGGCTACAATCATTCATACCCACATTGGGCTGTATTCTTGATGAATCCAAAGTGAT 

TTTCACCTCAACTCTGAATTTCATTCTCCTCTTTTGAATATAATACAACCATCTCA 

CTAGAGGAAGCATTTCAGTCTTTTCTGATTGGAGATTCATTATTGTTTTAGATAAT 

GTTTTCATTTGCTTATGGGTATATAAAAAATTTTATCTTAAAAATATTTCCTCTCA 
TTTAGCTAGCAACATTGTTTTC 

Sequence ID 561 

CTCAGGGTGATCTCTGAACCCAAACTTGCCCCAAAGAAGGTTGCTCTGTCCTCTCC 

ACATCCCCATCTCCTCCCTAGGGCCTTGTTGGGGAGAGGCTCCTCCATCTTTCCCA 

AGTCACACCATCGTTTCCTACGTGGTCTGGACAAGAGCAAGAGCACACCTTGTCCC 

CACCTTCTCCAGAGCAGCCAGAACCCACCTCAGGTGCCTTCCCCATCCGGTGCAGT 

TAAGGCACTTCTGCCAGCACCATGGTATGAGCACTAGACTTGGAGTTAAGATTTGA 

GAGCCCCCTCTGTCACTGTGGAAGCTTGAGCATGTTGCTTGATCTCTCTGAACCTT 

GTGTTTCTCATCTGTGAAAGGTGATAATGTGGGGCTGCTGTGAGATTTAAAGGACA 

TAATGCACCTACGGTCCAAGCACTGCCTGGAATACAGCANAAGCTCAACAGATACT 

GGACAACCCATCCCCTTAGTAGAGGCACTAACCATGTGACCCAAGGCAAAAGTGCT 
TAAAAAAA 



Sequence ID - 562 nt ; 58Q 

ATTGCATGCAAGTTTGCTGAGCTGAAGGAAAAGATTGATCGCCGTTCTGGTAAAAA 
GCTGGAAGATGGCCCTAAATTCTTGAAGTCTGGTGATGCTGCCATTGTTGATATGG 
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TTCCTGGCAAGCCCATGTGTGTTGAGAGCTTCTCAGACTATCCACCTTTGGGTCGC 

TTTGCTGTTCGTGATATGAGACAGACAGTTGCGGTGGGTGTCATCAAAGqACTGGA 

CAAGAAGGCTGCTGGAGCTGGCAAGGTCACCAAGTCTGCCCAGAAAGCTCAGAAGG 

CTAAATGAATATTATCCCTAATACCTGCCACCCCACTCTTAATCAGTGGTGGAAGA 

ACGGTCTCAGAACTGTTTGTTTCAATTGGCCATTTAAGTTTAGTAGTAAAAGACTG 

GTTAATGATAACAATGCATCGTAAAACCTTCAGAAGGAAAGGAGAATGTTTTGTGG 

ACCACTTTGGTTTTCTTTTTTGCGTGTGGCAGTTTTAAGTTATTAGTTTTTAAAAT 

CAGTACTTTTTAATGGAAACAACTTGACCAAAAATTTGTCACAGAATTTTGAGACC 
CATTAAAAAAGTTAAATGAG 

Sequence ID 563 

GCAACCTGCACAACCCCGCCCTGTTCGAGGGCCGGAGCCCTGCCGTGTGGGAGCTG 

GCCGAGGAGTATCTGGACATCGTGCGGGAGCACCCCTGCCCCCTGTCCTACGTCCG 

GGCCCACCTCTTCAAGCTGTGGCACCACACGCTGCAGGTGCACCAGGAGCTGCGAG 

AGGAGCTGGCCAAGGTGAANACCCTGGAGGGCATCGCTGCTGTGAGCCAGGAGCTG 

AAGCTGCGGTGTCAGGAGGAGATATCCAGGCAGGAGGGAGCGAAGCCCACCGGCGA 

CTTGCCCTTCCACTGGATCTGCCAGCCCTACATCCGGCCGGGGCCCAGGGAGGGGA 

GCAAGGAGAAGGCAGGTGCGCGCAGCAAGCGGGCCCTGGAGGAAGAGGAGGGTGGC 

ACGGAGGTCCTGTCCAAGAACAAGCAAAAGAAGCAGCTGAGGAACCCCCACAAGAC 

CTTCGACCCCTCTCTGAACCAAAATATGCAAAGTGTGACCAGTGTGGAAACCCAAA 

GGGCAACAGATGTGTGTTCAGCCTGTGCCGCGGNTTG 

Sequence ID - 564 n t : 671 

GGAATAGAATTTTAAATAGTAATAACTGCTTGTTTTTTTTGTGCAAGTACTTTTAT 

ACATAAGATAAACAAAAACCTTACCACCAAACATACCAAAATGCACCTCTTTCATA 

AGTGAGTTACTAAGATTTCTATACCTGGAATATCATGTATGTTTCATTTACTGGAT 

GTTTACATTTTAGGAAGGAAAATAGTTTTGTTTATTTAAACAACTGAATACTTATA 

AACTGTTGTTCCTGGAAGTTATTTATTCCATAAAAAATTTGTTCTTTTGTCATGAA 

TTTATAATTCCTAAATGAAGACCAGAAAGTACAAATTGCTGGGAGGAAGAATAGGC 

TTTATTAATCAACTGATGTCTTGATTTTTCTAAATGGGAAGATTGCTTTATTTTTA 

ACACTAATTATGGGAGCAGATTCTTAGCAAACTTCTTTGGAAAAGTTAATGTTATG 

ATGTGCATTAGGCTGCCCCATCGTGTATATAAATGAAGCAGATTTGATTTTTGTAT 

TCTTACGTTTCTCTGCTTTGTAGTTGTGGCTGTACTTAAAGAAATACAGAATTTCA 

TATATTTAAAAATGTTTAAAATGTGACCCACAGACATTGTAAATGGATTNAAAACT 

AACATGAAAAATATTCAACCTAAAAGAATTCTTAACTTCACAAGTGTTTTACTTC 



Sequence ID 565 
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CTTGGTTCCGCGTTCCCTGCACAAAATGCCCGGCGAAGCCACAGAAACCGTCCCTG 

CTACAGAGCAGGAGTTGCCGCAGCCCCAGGCTGAGACAGGGTCTGGAACAGAATCT 

GACAGTGATGAATCAGTACCAGAGCTTGAAGAACAGGATTCCACCCAGGCAACCAC 

ACAACAAGCCCAGCTGGCGGCAGCAGCTGAAATCGATGAAGAACCAGTCAGTAAAG 

CAAAACAGAGTCGGAGTGAAAAGAAGGCACGGAAGGCTATGTCCAAACTGGGTCTT 

CGGCAGGTTACAGGAGTTACTAGAGTCACTATCCGGAAATCTAAGAATATCCTCTT 

TGTCATCACAAAACCAGATGTCTACAAGAGCCCTGCTTCAGATACTTACATAGTTT 

TTGGGGAAGCCAAGATCGAAGATTTATCCCAGCAAGCACAACTAGCAGCTGCTGAG 

AAATTCAAAGTTCAAGGTGAAGCTGTCTCAAACATTCAAGAAAACACACAGACTCC 

AACTGTACAAGAGGAGAGTGAAGAGGAAGAGGTCGATGAAACAGGTGTAGAAGTTA 

AGGACATAGAATTTGGTCATTGTCACAAAGCAAATGTGTCGAGAGCA 

Sequence ID 566 

GTCACCAAGAGCTTGTTGTCAGGTTTTCACTTGCTATTCGCAGAGATTTTTTTTAA . 

AGGCACTATTTGTAGTGTTAAAAGGGTGAATTTATCANAAGGCATAATAATCATAA 

ATGTGTATATGCCTAATAATAGAACTTTAAAAGGCATGAAGCAACACTCAAAAGGA 

TTAAAGGGAGATCATCTCACCCCCTTCTTACCAATTGATAGAATGATCTGATGAAA 

ACAGTAAAATAACAACAGATCTGAACACTGTCAACCATCTTGACAAATACTTATGC 

CTAGTGTTCCATTATTGGAACACTAAACATGTGGAATGATTTATATCCTACTGCTC 

AAGGTCATCACCAAGGTCTAATTGTAAAATTTCAAAAAATTGCAACCTCAGGCATA 

AATGGGTTAATCGACATTTATAGCACACACATGCAACATGTACCAGAGATTCCTTC 

TTTTCTATGAACATGGTACTTCCACCAAGATAGACCACATTGTGAACTATAAAACA 

AATCTAAAAACATTTGAAATGAAGGAAATTATATAAAATATGTTCTCTTGATCTCA 
ATGAAATTAAATTAATACTATAT 

Sequence ID 567 

CTCATGGCGGCCAATGTAGGCCCAAAACTTCCTCAAGTCAAACTCTCCAGGCCCAC 
CTTCTGCTTCCCGGTGGCATCAACAGGCCCAGCTTTGACTTGAGAACAGCCTCTGC 
AGGCCCTGCTCTTGCCTCCCAGGGGCTTTTTCCAGGCCCAGCTCTTGCCTCATGGC 
AGCTGCCCC^GGCCA^TTTCTGCCTGCCTGCCAGCAGCCTCAACAGGCACAGCTC 
CTCCCTCACAGTGGCCCATTTAGGCCCAACTCATGACTGTGAGGCCATTTCCAGGC 
CTAGTGCCTGCCTCGTGGCTGACTCTTGAAGCCCAAAACTTCCTCAAATCAGCCTT 
TTGCCCAACTTCTGTCTACTGTCGGACTCTACAGGTCAGCCTCTGCCfCACAGTGG 
ACCCTCCAGACCCAGATGGTGTCTNCTGTGGCATCCTCAGGCGAAGCTCCTGCCTT 
TCGGC^GCCTCTCC^GGCCCAGCTCCTCCTGCTCC^GCCTTCTCTCCAGGCTCTGA 
ACTTTCTCAGGTCTCCCTCTGTTGTCCAAGGCTGGAGTGTAGTAG 
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Sequence ID 568 

TATATATGTAATGCCCTTAACCTAGTGTTTGGCATGATCGTTGCTGAAAGGGAAGC 

TTGTGGGTACAGTGTCCCCTCAGAAGCCAAAGCCCAGGGAAGGTCGCCTGCCCAGG 

TCAGGCTCCCAGCGAGTTTGTCTGGGGAGGGGCCATTCATACCTCCAGGTCAGGAC 

AGAGGCTCGGGCTGAGGGAACCCTACACAGGTCCTGGAAGCAGATCCTTCCTGCCT 

AAGCCAGCAGGACAGCTCAACAGGAAGCATCTTCCAGCCACGGGAGGAGAGGCAGC 

ACCTTTTTTGGAACCATACAGAGCTAAGAATGGTGGTACAAGTAATAGATTCTGTA 

CTGGCAACCCCACTTGGTGGAGCAAGTTCTAGGAAAAGGGGGCTGTCCTTGAGTCA 

GCCATGGGGTCAGCCACACAGTCACCGCAGCTGCTCTTTGGCACCGGGCGCTGGAA 

AGACCTAGGATGACACAGCCTGGAAAGAGCTTGGGAAAAGCTCATCTTCCACAGAA 

CTACCTGCTATACCAGCCAGGGCAGGTGCTTATTCCCACAACAGCCCTCTGTTGTA 

GGCGGCAGTGCCATCCTGAANGTGCCGTGGTACCTTCTGAANACCCAGCTGAGGGC 

CTGTAATGGCACTTGCATGCCACATGGNACACCCTTTCCCGGTTAA 

Sequence ID 570 

ACCGCGGCCGCGTNAANAAAAAAAAAAAAAGAATTCCACTTGATCAACTTAATTCC 

TTNTCTTTATCTTCCCTCCCTCAGTTCCCTTTTCTCCCACCCTCTTTTCCAAGCTG 

TTTCGCTTTGCAATATATTACTGGTAATGAGTTGCAGGATAATGCAGTCATAACTT 

GTTTTCTCCTAAGTATTTGAGTTCAAAACTCCTGTATCTAAAGAAATACGGTTGGG 

GTCATTAATAAAGAAAATCTTTCTATCTTAAAAAAAAAAAAAAAAAAAAAAAAAAA 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^ 
AAA 

Sequence ID - 571 n t : 457 

TTAGAGAGGTGAGGATCTGGTATTTCCTGGACTAAATTCCCCTTGGGGAAGACGAA 

GGGATGCTGCAGTTCCAAAAGAGAAGGACTCTTCCAGAGTCATCTACCTGAGTCCC 

AAAGCTCCCTGTCCTGAAAGCCACAGACAATATGGTCCCAAATGACTGACTGCACC 

TTCTGTGCCTCAGCCGTTCTTGACATCAAGAATCTTCTGTTCCACATCCACACAGC 

CAATACAATTAGTCAAACCACTGTTATTAACAGATGTAGCAACATGAGAAACGCTT 

ATGTTACAGGTTACATGAGAGCAATCATGTAAGTCTATATGACTTCAGAAATGTTA 

AAATAGACTAACCTCTAACAACAAATTAAAAGTGATTGTTTCAAGGTGATGCAATT 

ATTGATGACCTATTTTATTTTTCTATAATGATCATATATTACCTTTGTAATAAAAC 
ATTTTTCCC 

Sequence ID 572 

CGTCTATTTGNGTTTCTTCTCACAATTGGTAAGTTCTCTGTATTGATTGATGGCTA 
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AGTTTGATTAGTGTTTTTCTCTAGTTGGTAATTATATTCTAGTATTTTATCATCTT 

ATTGTTTACTCAACTNAAAGTGNCACAGAAGAGTTGCCAGGTTTCTCTTTGATATG 

AGATCTCTlSnOTTGATTTGGAATGCAAATCANAAGTGTCATGTTTTGAATAAAGGGA 

CCAGATGACTTATAGGTATTCTTTCTCTAAATATAACTAAGGTAAGATTTTTGTTT 

TGAGGTACTTAATCTATATAAGTGGTAAAGAATTTACTTGAATTTCTCCAAATTCT 

CATGTCTAAAGTCTGATTGATTAAATTCATTCTTGGTATTTCATTTTGAAAAGAAT 

GTAGCTTTAGCAAACCTCTTTGTATAAATGCAGTGGGATTAAGGTCATTTAAAAAA 

TTGTTATATCATTGTATTTTTAAAATTTACCAGTTTTATTTTTCTTTTTACCCTTT 

AGCCCGGCCTCAGAAAGTGTGTTTGTGTCCATTTCTCCCAGCGCACCCTCTGCATA 

tctctacccactt'gtcataattcagcatccagcagaggaaaacaaagtgttgcgta 

CAGTTCCTCTACTAGCAGCATGCCTCCCCCAGGACAAGTGTA 
Sequence ID 574 

TTATTGCTGACATAAAAATGGTGCACATCGGCCAGGGCCCAGGATGAATCAGCCAA 

TCTGCACCATTTATACATGGAACTGGAGAACATTGTGCCAATAATCATTTAATATA 

TGCCAAATCTTACACGTCTACTCTAAACTGCTCTAATGAAGTTTCAGTGACCTTGA 

GGGCTAAAGATTGTTCTTCTGGGTAAGAGCTCTTGGGCTGGTTTTTCANAGCAGAG 

TTCTTGTTGTGGGTAGACTGTGACTAGGTTCACAGCCTTTGTGGAACATTCCGTAT 

AACGGCATTGTGGAAGCAATAACTAGTTCCTATGAAAGAACCAGAGCTGGGAAGAT 

GGCTGGGAAGCCAGGCCAAAGTGGGGGCAACAGCTTGCTTCTCTTTCTCTTCTCAC 

CCTCAGTTTGTATGGGAAAATGGAGATGTCCTCTCCACTTTATCCCACGATATCTA 
AATG 

Sequence ID - 575 nt . 2Q9 

CAGGATATCGAGACCATCCCAGACAGCATGGTGAAACTCCGTCTCTACTGGAATAC 

AAAAAGTTAGCCGTGTGTGGTGGCACGCGCCTCTAATCCCAGCTATTCGGGAGGCT 

TAGGCAGGAGAATTACTTGAACCCGGGAGGCGAAGGTTGCAGTGAGCTGAGATCGC 
ACCATTGCACTCCACCCTGG - CGACAGAGCAAGACTCCGTCT 

Sequence ID - 576 nt . 541 

CAGCCAACCCAGAAGGAGCCAGTCTACAACTATGCCTGATCCTCCTCATGGCAGGC 
CACGAAGCATTGCTGCCATGTGTTGAATTATAAAACCCACATTGCTTTTTGAACCC 
TGTTGCGGGTAAAAATAACCAAATTATCAGTCCTTGGAAACCCAGGCAATCAAGTG 
AGTACAAGGTAAAGATAAGTATGGTTTAGAGGAGAAATTATGTTCCTGAACTGGTG 
TCCTTTGATGGCAGCGTCAGCCTTGCTAAGTCAGAGTAGAGGGAGCAGTGACCTTA 
ATAAGCTTTGGTGAGCATCATGTGCACGCGTGGGTGGGAGTCCCTTTCACTGATGC 
TTTTAAAAGTGCTTTTGCAGACCCTGGAAGGGATCCTCCACACATATGAGGTGTGG 
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GACAGGTAGGCCAGAGAGGATTAGCCCTGCTTTCGAGACTAGAAATCTACAGTCCT 

GAAGGAGCAGTAATTAATTGGTACACCTGTCAGGGCCAGCCCCCAGGTCTCCTGGC 
TTTTTCCAGGTTTTCTGTCTCACATGATTTTGCTTTT 

Sequence ID 577 

CTTTAATTTTTCAAGTGTTTAAAAAACAATTTTATACTTAAGCCAGCCTTGAAGAT 

AAGCACAAAATTTACCAGTTTACATTTAAAAAACAAACAAAAAACGACAACAACTC 

AAGCACCCGCTCTGTGCA.TAGCACTATTCTAGGTGCAATAAAAGGGAATCTTAACC 

TTAGAAATATGAGTTCACTTTCTGGAATTGTATTATCTCCTTTTCCAGAGAGTAAA 

AATAAATAAAATCACCATTGTTTACTACAGATCTGCCCCAAACCACATCTGGTTCA 

CAGAAAGGCTAATTTCTGCCAAATTAAAGATGTAATGAACTCAGTTCCTGCTTTCC 

CAAAAACACGAAAGCAGAATTCCTTTTCACTGAAAAAAATAAACAGTTTTCCATGC 

AAGGGCAGTTTGCTTCTAATAAGTATTTTTTAAAAAATTTTTTTTTCCTCTAGCTT 

TTCTTTAAATTTTCTTCCTCTAATATTGCCTTTTCTTGTACAAGGCAGACCAGGTA 

TCTTTTTATGCTGTTTTTCCTTTACTAAGAAAAGTATTGCATCTTGAAGACAAACC 

ATTTCCCAGAGTAGTGATAAAAAATAACACTAAAAAAACTTTAAAGGTGAGTCACT 
TCATCACCTTGATGAAGTAAAAAA 

Sequence ID 578 

GGAAAAAATATTTCCACTTAGATATTTTACATGGTTTTGTTTAAAATTACCATTAC 

TTGTTTTTTAAAAACACATGACCACATATGTATATGTATATCTACCTAAACATTGT 

ATCATGGTTTCAGTATGTTATTCATGTATTACTGGGAGATGCTACCAAGAAACCAA 

CCCAAAGAAAATTCTGGAAAATACATTTCTATTTATAGAATAAATGTTTCATTTAT 

ATAAAAGCAAAAGAACTTAGAGTTCTAATAAATGGGATGTCTAATAAATTATGAAG 

TTACTGATTTGAATATATTATATTTTTATAACTTCCTTGCCAAAGTCCTGATTTAG 

TACATTAGAGAACCTGTGTTTCCTCTCTCCTCTACCATTCATCTCTCTTCCATACA 

GTCATTTGGGCTTTTTACTCAAAGAGAATCAAGAAATAATAAGGTATAACAAGCTT 

GGCAAAGTGTTGGCTTTTTAAAAAAAAATTTTTTTAATCTCTAGCAGTTTGGTAAT 

TTAGCAGCATCATTTATTTGGGATTCTTTTATCTGATTTCAACAGTGAAAAACATC 

CCTATGATAAAGCCTAATGACCCATTTCCAAAAGATGGAATTGCCCTTCCTAGAAA 
ATATGACGGAGAAAAGT 

Sequence ID - 579 n t : 502 

CGAATAGCCAAGTGGTCTGACAAGATCGAGAGTAATGAGGCCCATACTTTAGTACA 

GTCTTGAATGGCCAGATGGTGCTGGGCATACCCCAACCAGAGATATGTAAGTCTTT 

ATGTTGTCAAAATTTCCCAGAAACATGAATTTCCCACTAAGATTCATTAAGGAAAA 

CTAGAATGAAAAC^AAACGTTCCTTGTATAATATTCATTANAAAGAAATGAAGAA 
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GGCCGGGCATGGTGGCTCACGCCTGTAATCCCAGCACTTTGAGAGGCCAAGGTAGG 
CAGATCATGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATAGTGAAATCCCGTC 
TCTACCAAAAATACAAAAAAATTAGCCGGGCATGGTGGCACACACCTGTCATCCCA 
GCTACTCAGGAGGCTGAGGCAGGAGAATTGCTTGAACCTGGGAGGTGGAGGTTGCA 
GTGAGCTGAGATTGCACCACTGTACTACAGCCTAGGTGACAGTGCAAGACTCTG 

Sequence ID - 580 nt . 31g 

CCTATGCCAAACTAAAGAAAGCTTGCCTGGCCTACAGGCCTAAAGGTTCAAATGNG 

GATTAAAAAAACACAGTAGTCACATAAAATGTCTGCTGGCTGGCTGGAATTCCATC 

ACCTACAATTTACCTGCTTTCAAAAACTGTGTTCAACATTGAGAAAACAGAAAACC 

ACTTATCTTGAGCTTAATATGGGCTTCTTTTTCCTTAACTGTAGAACACTTACTGA 

AATATCAAATCAATGGTTAGGATATGTATCCTAGGCAGGCCTAAACCATTAACACT 
TGGTTTAAGCAACTTTGTATAATTNACCTCCTAAAT 

Sequence ID 581 

CTTCATGAGTGCCCGGTTGCCCAAGTCAAAAACCTGGGAGTGATATAAACTCCCCA 
CACATCCAGTCAGTCACTCATCAACTCTATTGATTCTG - CTGCTAAATATATCTCA 

ATTGTATTAACTTAAACATATGCATAATACATCTTCTTCTTCACTGCATTTTTGTG 
GGCTGCACTTACCTTTCAGGTAACAACAACACTGGCCCCTCTTGCCCTTCTAGTCA 
GAAGTGCCAAAATGATGAGAGCTAGCCATGACAAACCCACAGCCAACATTACACTG 
AATGTGCAAAACTGGAAGGGCATCCAAACAGAGGAGG 

Sequence ID 582 

TAGAATTCTCGCCTGCCTTGGCTTCTCCCTCTAGTTGTTCCTTCTCTGTCTTCTGT 

GGGCTTCTTATTGTCTGCTCACTCCTTCTTCAGTGTCCTCTCATGGGCTTCCTTCC 

CTTCTCAGCTGATGCCATCACCTGGGGAATCACAGTTACTCAGCAGCACTGGGGCC 

TCTCTATCTCTATGCTGGTCATGCCTATGTGTGAGCTGCAGACCCAGTGGAATTTC 

CATTTGTGCATCCCATGCCCAGCCCACCCTCCACCAGCCTCGAATGCAGCTGTTCA 

GCCCTACCCCAGTCCTCAGAAAAGTTCCTCTCCCTGGATCCTGTTTTTCCTTCATG 

AGTGCCCGGTTGCCCAAGTCAAAAACCTGGGAGTGATATAAACTCCCCACACATCC 

AGTCAGTCACTCATCAACTCTATTGATTCTGTCTGCTAAATATATCTCAATTGTAT 

TAACTTAAACATATGCATAATACATCTTCTTCTTCACTGCATTTTTGTGGGCTGCA 

CTTACCTTTCAGGTAACAACAACACTGGCCCCTCTTGCCCTTCTAGTCAGAAGTGC 

CAAAATGATGAGAGCTAGCCATGACAAACCCACAGCCAACATTACACTGAATGTGC 
AAAACTGGAAGGGCATCCAAACAGAGGA 



Sequence ID - 583 nt: 



631 
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CTGAGGTGGGAGGATTCCACTCTCACCCATTTCTTCTTTCATTTTCAGTTTCTCCA 

GTTAGTAACTGAAGATGTTCTTTGAGTAATTAAGTGAGTGAGAAAATTTTTAAGTG 

AGAAATCTATAAAAAGAACCATGTTAACATAAATATTTCAGTCCTTACAAGTTGGT 

ATTGACTTTTCTCATTGGTAATCTGACTGATTTAATACTGCTCATTCCAATATCTG 

GTGATGTAATTCTGGTTATGAATCCTTGTATTAATAACACCTCCTGGGAGGTTTTT 

TTTCCCCAACATTACATTCAGAATATTAGAGCTGAAAATACCTTTTTTAAGGTTAT 

CAGGAGGAGGGAGCTTATGTTTAATGTGGTGGATAAAACTTAACTGCTGGTTAATA 

CAATTGTTATTCAGGTGAAATTCCCTAAACTTTTCACGTGCAAAGTTTTGTATGTA 

TACAGACATTTGGGGAAAAGTTTTATCATCCCTAAAACCGGTTACTGTCCAGAAAA 

TGATAAGAATCCCTGGGTTCCAAATCCTTCATAAGGTATTTATTCATTTATTTATT 

CAACACATTTACTCAATGCCTCCGCTCTGCTGCAACTACACTGACATTCTGCTTCT 
AATCTAAC CGAAAAT 



Sequence ID 585 

TTTCAAATTGTACAATAACACAAACAACTTTGTTAAGGCCATGTTTTATTTGCTGA 

TTAATGGACAAAAGGCAATGTAATTTATTTTCAAGTATTTTCTTGAAAGTCTGTGC 

TCATAAAAATCATGAAAAGTTGGAAAGACTGTTAAATCACTGAAACTTCAAATATA 

TCTTACACAATCTTGTTTGTACAAAAATACAAGTTAAATATAAA.CATAAAGCAATC 

ATGGTAATTTTATGCAAATCTGTTTTATGTGATCATCAGTTATATATAAAAGTTTC 

TCAGTTCTGTTATTTGTGAAAAGATCAATACCAGATTGAATGACTACCTATTGGCA 

AAGGGCCCTAAAAAGCTTACTTTAGCACTCATCTTTTACATGGTTAAATGCATTTC 

CTAATTTGAGATCACCTAAACACTGGAAAAGAAAAAAAATGAAAGGGCAGTATGTC 

CATAAACCAACAAATAATTTGGCTGTAATGTATCATAAAACACAAACCCCACACAT 

CTGTACAATAAACATTATGTATTACATACACACAACACACACCCAGTCATAAAGCC 

TAATGATGTGCTGCTTCCAGTTCAATATTCAGCTGTGCATTTTTTCTTATTTCATC 
AAATGAATAGCTTTTTGTCACC 

Sequence ID 586 

GTAAACTGTTCTCTCCGAGGGAAAAAATGGAAGTTATCCTCACAGTTCACTGCCGT 

GGTATTTCTTCTGTCCCATGCTTTGCATGACTGCCATGGTACAGCCTTGTTTCAAA 

CTGTTCACTGTGATCTGTGGGTCTTTGAGTTTCAGTGAGTTTGCTGAAATGTCGAA 

GAAGTAGTTCCAAACTTCAATGTTCAATGAAATTTTTGTTCAAGTTTGAAATGGAG 

AGAGCAGCTTTAAAAGGTACTAAGCCTTTTACAAATTGGTGAGTACTGGCACATGA 
GAT 



Sequence ID 587 

TTT "^TTTTTCCTTAAAAGGTAACCCCrAAACACAGCTAAAACTATGCCATCAGC 
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TGACTCCAAGGNACACACAGTCCTGTATCTGGAACTACTGAGTGGCAGGCATCTTT 

CTCTGCCTCTGACAGTGGAGTCCCCATCACTGCAGAGCATAGCCAAAGGAGTCAAA 

GGTCTCAGCGGGTCACTGCCTTATCAACCCTCACCAGTCCCTTATGTTTTTTAATA 

TTTTATAATCTTGACATGACACCAAGATGCTTTAATAAAAAAGCACCTCTAACTCG 

GTCTTGTATTCACTTACCTTGAGCCTGGGACTTCTCTAGGCTCCTGAGGCAAAAAC 

AGGTAGAGGGGAGATGGTGGAACATAAAACACAATTTTGCTTGGCACCCACCTTGG 

CGTCTGTCCCCATGACCAGGTCTTTCAATTCGATGATTTTGTCATTGATGGAGGAG 

CGATATCGTTTCTCAATGATATTATGGGTTGTCCGCCTTTCTCCTTCTTTGGGGGG 

CTCAAGCTGCTTGACTCCCCCAGGTACCTGCTTAATGGGGCACTTTCTCTTGCCCC 

ATCATTACAGGCATTGTGGTCAGAATGGTCCCACTGCTGCCCACCAGGGTCTA 

Sequence ID 588 

CTAGTCTTTTCATAGTCTGCATAGAGTCTGGCCATTACCATCAGTTTTTAAGATGT 

CCATATTGTGGCCGGGCGCGGTGGCTCACGCCTGGTAGTCCCAGCACTTTGGGAGG 

CTGAGGCAGGTGGATCATGAGGTCAGGAGATCGAGACCATCCTGGCTAACACGGTG 

AAACCCGTCTCTACTAAAAAAAATATTAAAAAATTGGCCAGGCCTGGTGGTGGGCG 

CCTGTGGTCCCGGCTGCTTGGGAGGCTGAGGCAGGANAATGGTGTGAACCCGGAAG 

TCGGAGGTTGCAGTGAGCCAAGATTGCACCTGGGCAACACAGCGAGACTCCGTCTC 
AAAAAAAAAAAAAA 

Sequence ID 589 

CAATTATTTATTACCTTTCCATTTGTTCGCCTGATGATGTGACAATGCATGGTCTT 

TGTGCATGCTGCTAGACACTTTTCTTTCCCAGCCGAAAAGTCTATTATGTAATTTT 

TACATTCATAATTTTAATGTGGATGATCAGGATTAAATCAAGATATATATCTGGAA 

CCTCTTATAAATGGAGCACTTAGAAATTTGTTGTTCTGCACTTAACCTAGAGAGAG 

AAAAAATGCTTTTCTTTGTGAAAAATCTGAATTCCTGTCCTGACCTTCTGTGATGT 

GGAAACCCTAGGCTCTGAGACACACTCTCTGGTGTCTGAGACAGAACCAAAGCAAT 

AACGTTGTGATGCCCACAGGCCTGGAGCCAGCTAGCGACCTTGTGCCGCCCAGCTG 

TCCATGGCCCGTGCAGAGCAGAGGACAGTGAGTGTCTGCACTGAGAACCTTAAACC 

ACAGTTGAACATACCCACACCTGTTTGTCTTAAGCTATAGTGTAAAAACAAAGTTT 

GGGCTCTGAAAATTTAACTGAAAAAGATTTCCTTGTT 

Sequence ID 590 

GTGGCAGCAGGCGCAGCCCAGCCTCGAAATGCAGAACGACGCCGGCGAGTTCGTGG 
ACCTGTACGTGCCGCGGAAATGCTCCGCTAGCAATCGCATCATCGGTGCCAAGGAC 
CACGCATCCATCCAGATGAACGTGGCCGAGGTTGACAAGGTCACAGGCAGGTTTAA 
TGGCCAGTTTAAAACTTATGCTATCTGCGGGGCCATTCGTAGGATGGGTGAGTCAG 
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ATGATTCCATTCTCCGATTGGCCAAGGCCGATGGCATCGTCTCAAAGAACTTTTGA 
CTGGAGAGAATCACAGATGTGGAATATTTGTCATAAATAAATAATGAAAACCTAAA 

Sequence ID 591 

5 CAGCAGCAGAAATGTTTGCAAGATAGGCCAAAATGAGTACAAAAGGTCTGTCTTCC 
ATCAGACCCAGTGATGCTGCGACTCACACGCTTCAATTCAAGACCTGACCGGTAGT 
AGGGAGGTTTATTCANATCGCTGGCAGCCTCGGCTGAGCAGATGCACAGAGGGGAT 
CACTGTGCAGTGGGACCACCCTCACTGGCCTTCTGCAGCAGGGTTCTGGGATGTTT 
TCAGTGGTCAAAATACTCTGTTTAGAGCAAGGGCTCAGAAAACAGAAATACTGTCA 
1 0 TGGAGGTGCTGAACACAGGGAAGGTCTGGTACATATTGGAAATTATGAGCAGAACA 
AATACTCAACTAAATGCACAAAGTATAAAGTGTAGCCATGT 

Sequence ID 592 

TACTCAATGAAAAACCATGATAATTCTTTGTATATAAAATAAACATTTGAAAA?^AA 
15 AAAAAAA 

Sequence ID - 593 nt: 565 

CAGGATCAAGGTGAAAAGGAGAACCCCATGCGGGAACTTCGCATCCGCAAACTCTG 
TCTCAACATCTGTGTTGGGGAGAGTGGAGACAGACTGACGCGAGCAGCCAAGGTGT 
20 TGGAGCAGCTCACAGGGCAGACCCCTGTGTTTTCCAAAGCTAGATACACTGTCAGA 
TCCTTTGGCATCCGGAGAAATGAAAAGATTGCTGTCCACTGCACAGTTCGAGGGGC 
CAAGGCAGAAGAAATCTTGGAGAAGGGTCTAAAGGTGCGGGAGTATGAGTTAAGAA 
AAAACAACTTCTCAGATACTGGAAACTTTGGTTTTGGGATCCAGGAACACATCGAT 
CTGGGTATCAAATATGACCCAAGCATTGGTATCTACGGCCTGGACTTCTATGTGGT 

2 5 GCTGGGTAGGCCAGGTTTCAGCATCGCAGACAAGAAGCGCAGGACAGGCTGCATTG 

GGGCCAAACACAGAATCAGCAAAGAGGAGGCCATGCGCTGGTTCCAGCAGAAGTAT 
GATGGGATCATCCTTCCTGGCAAATAAATTCCCGTTTCTATCCAAAAGAGCAATAA 
AAAGT 

30 Sequence ID 594 

CAGAAGAGTAAGCAAATCTCAAAGCAGCGAAAGGGAAGAAACTAAAAAAGGTAGAG 
CAGAAATAAGAGAAAATAGAGAAGAGAACAATTGAGAAAAATAATTGAAACCAAAA 
GGTGGTTCTTTGAAAAGCCTAACAAAATGGACACATCTTTAGTTAGAGTGACCAAG 
AAAAAAGGGCAGTGACTCAGATTACTTCATTCAAGAGTGAAAGAGGGCACATCACT 

3 5 ACCAATTTACAGAAATAAAAAGGATTATGAGGAAATACTACAGATAATTGATGACA 

0 

TTAACTTAGAAGAATATATTTCAAGAAAGACACAAACTACTGAAACCGACTCAAGA 
AGAAACAGAAAATCTGAACAGACCTATAAAAAATAGAGATTTAATTGATATTCAGA 
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AAGTTTCCCAAAAAGAAAAGCACTGGCCAAGATGACTTCACTGGTGAATTCTATCA 

AGTGTCAAAGATGAATTACTGACATTCATTCACACTCCTTTAAGAAATAGAAGAGG 

GGACATCACTTTTCAAAGCATCGACATTCTAATCATTAGTCCCTTGGTTTCCTGCT 

CCCAAAGCCAGGTGATGTATCACAAAAAAACCCCTACAGACCCACTGGGCACAATG 
GCTTTATGCCTAT 

Sequence ID - 595 nt: 98 

CTTTGCTCGAATNGTCAGATAAGGATTCTGTGAANGGAGATGAGATTTCCATCCAT 
GCTGACTTTGANAATACATGTTCCCGAATTGGGGNCCCCAAA 

Sequence ID 596 

CTCAAGTGTTCCCTCAGCTTAGGCTTTGTTTAAATGATCCCACCCAGGGGCGATGG 
TAGGGAACAACAGGGTCACTAAACTATTTGGCTGGCTACAACTCTGGGAAATGGTA 
AGACAGGGAAAGGCCATGTTGTTCATTCCCTTGTGCAGATCTAGGGAGAACCGCAG 
AGAGAACAGTTAGCATTTCTTGTTCAATGAATTATCCTATTAAGAACACTGGATGT 

Sequence ID 597 

CGGNCGCGGTCGACGCTACTCCTACCTATCTCCCCTTTTATACTAATAATCTTATA 
AAAAAAAAA&AAANAAAAAAAAAAA 

Sequence ID - 598 n t : 362 

GGCATGTGCCTGTAGTCCTAGTTGCTGAGGTAAGAGGATTGCTTGAGCCCAAGAGT 

TCAAGGCTGCAACAAGCTTTGATTGCGCCACTGCACTCCANCCTTGGCGACAGACT 

AAAACGCTGTCTCAAAAAAAAAACAAAAACGAGNAAAAAAAAACAAAACAGAAAAA 

ATTAACTTAGGCAATGAC^GTCCCTGGCAAATGCTGGGAGGGAGGC^yVCANTGGTC 

AAGGAAGGTAACCCTGAANCAGGACTTGTAAAGCAAATAANATTGGGAGGCCAAGG 

TGGGTGGATCACNAGGTCAGGAGTTCGAGACCAACCTGGCCAACATAGTGAAACCC 
CGTCTTTCTAAAAATACAAAAAAATT 

Sequence ID 599 

GACAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCT 
TCCTAGGGTTTATCGTGTGAGCACACCATATATTTACAGTAGGAATAGACGTAGAC 
ACACGAGCATATTTCACCTCCGCTACCATAATCATCGCTATCCCCACCGGCGTCAA 
AGTATTTAGCTGACTCGCCACACTCCACGGAAGCAATATGAAATGATCTGCTGCAG 
TGCTCTGAGCCCTAGGATTCATCTTTCTTTTCACCGTAGGTGGCCTGACTGGCATT 
GTATTAGCAAACTCATCACTAGACATCGTACTACACGACACGTACTACGTTGTAGC 
TCACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGCCATCATAGGAGGCTTCA 
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TTCACTGATTTCCCCTATTCTCAGGCTAC^CCCTAGACCAAACCTACGCCAAAATC 

CATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCACAACACTTTCT 

CGGCCTGTCCGGAATGCCCCGACGTTACTCGGACTACCCCGATGCATACACCACAT 
GAAACATCCTATCATCTGGAG' 

Sequence ID - 600 nt : 595 

TTCAAATTCTTGNTAANAGTCTTTGTTCTGAATTTTACTTTGTCTGTTATTCCTAT 
AGCCTTTCCAATTTTCTTTCGCTTGGATTTTACGTGATAAGTTTTTTCCCCCATTT 
TACTTTTANCAACTCTATATTTTTTAGTTGAGGTTGGGTTTCTTGTAAACAGCATA 
TAATTTGGGTTTT^TTAATCCAATCTGAAAATTAATGTCCTTAATTTTGTGTTTATA 
CCATTTACACATAATGTACTCATATATAAGGTTTAACTGAAACCTACTATCTTGCT 
AGTTGTGCTCTACTTGAATTTTTTTTTAGTATTCTGTTTTAATTGACCAACATTTG 
ACTGTATCTCTTTGTGTAATTCTTTTACAGGTTGCTGTAGGCATGACAATATATAC 
ACTTAACTTTTCTCAGTACACTGAGAGTTGAAATTGTAGTACTTCGAGGAAAACAT 
AGAAAACTTGCT^TGATATCGGTTACATTTTACCACCTCCATATGTTGCAATTATT 
AAATGTATTAGATCTGCCTACCTCGAAAACCCATCAGTCTTTTAACTTTGCTCTCA 
ATGGTGATTCATATTTTTAAAAAAACTTGAGGCAA 

Sequence ID - 601 nt: 522 

TCGACCGGGTTTGGAGCAGTGCCTTGTTTGCTGTGCAGCGGATACTCTACAGGTAC 

ATTTCCTTTTTGGAACCAAAAGGGAGGGATTTGACAATATTGATGGTAGATCTTTT 

TTCTTTAGCAAGAATTAAGGATTTTGGTGGGTGGGGGGAGGCTTCTGTGGGGACCA 

AGACAATGTACTGTCAGTCAGGATTTAAGTCGAACTACCTCATCCCTTGCCCCAGA 

GAACAGTTGATCGTGTTTTAAACCAAAAGGTGCGGAATGGAGAGAGGGAGGCGGTG 

CATTGCAGCTTCCGATAGAGCTTTTTATTTTTGGATATCAGGAACCAATTTTGAAG 

ATTTCTTAAGAAAGTCATTTACATCAGGGACATGAAGAGCAAAGTAGGTATTTTTG 

GTCAGTACTTGAATTTGATAGGCTTTATGCAAACAACTCTCCCTCTGCTGGAGTCT 

GGCAAGTTTGCTTTTCACTGGACGCTAATTG^GTGCCATACAAAACTAAAATAAN 

AGTTTTACTTATAACACA 

Sequence ID 602 

CAGAAATCGCAATTGAAGACCAGATTTGTCAAGGTTTGAAACTGACATTTGATACT 
ACCTTCTCACGaAACACAGGAAAGAAAAGTGGTAAAA 

GGAGTGTAT/^AACCTTGGTTGTGATGTTGACTTTGATTTTGCTGGACCTGCAATCC 
ATGGTTCAGCTGTCTTTGGTTATGAGGGCTGGCTTGCTGGCTACCAGATGACCTTT 
GACAGTGCCAAATGAAAGCTGACAAGGAATAAC 

GGACTTCCAGCTACACACTAATGTCAATGATGGGACAGAATTTGGAGGATCAATTT 
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ATCAGAAAGTTTGTGAAGATCTTGACACTTCAGTAAACCTTGCTTGGACATCAGGT 

ACCAACTGCACTCGTTTTGGCATTGCAGCTAAATATCAGTTGGATCCCACTGCTTC 

CATTTCTGCAAAAGTCAACAACTCTAGCTTAATTGGAGTAGGCTATACTCAGACTC 

TGAGGCCTGGTGTGAAGCTTACACTCTCTGCTCTGGTAGATGGGAAGAGCATTAAT 
GCTGGAGGCCACAAGGTTGGGCTCG 

Sequence ID - 603 nt . 624 

GACACACGAGCATATTTCACCTCCGCTACCATAATCATCGCTATCCCCACCGGCGT 

CAAAGTATTTAGCTGACTCGCCACACTCCACGGAAGCAATATGAAATGATCTGCTG 

.CAGTGCTCTGAGCCCTAGGATTCATCTTTCTTTTCACCGTAGGTGGCCTGACTGGC 

ATTGTATTAGCAAACTCATCACTAGACATCGTACTACACGACACGTACTACGTTGT 

AGCCCACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGCCATCATAGGAGGCT 

TCATTCACTGATTTCCCCTATTCTCAGGCTACACCCTAGACCAAACCTACGCCAAA 

ATCCATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCACAACACTT 

TCTCGGCCTATCCGGAATGCCCCGACGTTACTCGGACTACCCCGATGCATACACCA 

CATGAAACATCCTATCATCTGTAGGCTCATTCATTTCTCTAACAGCAGTAATATTA 

ATAATTTTCATGATTTGAGAAGCCTTCGCTTCGAAGCGAAAAGTCCTAATAGTAGA 

AGAACCCTCCATAAACCTGGAGTGACTATATGGATGCCCCCCACCCTACCACACAT 
TCGAAGAA 

Sequence ID - 605 nt . 338 

ACCTGAGGCCTCGGTGGGGCCAGTGCGACGCTGGCTTAAGGAGCTGGAGGGGTTCC 

TAATACACATTTAATTCAGTTTCTCTTCCCTAAGAGGCTGCCGGAGTTGGGGCCTC 

CTCCAGCAGAGACCCTCGGACCCCTGCAGGGCCTGGACTTGGGGTGAACAGGGCTT 

CAGTCAGCGCAAGTATTCCATTTGCATTTGGTAATTTTTCATGCCACCTATTTATG 

AATATATAAATCTTTATACCAAATCTATTTTTTAAAACATGGAAAAGTTGCCTTTA 

TGGAAACTTGGCAGAGCCAGAGTGTACACATTCCTAAAGCATTAAACAGATTTCTA 
TA 

Sequence ID - 606 nt : 555 

GGATAATGATACCTCTGACCTTTCTTCCTTTTGGGAAGTACTTGAGTGTGCAGCTG 
CATGAGGCCTCAGCAGGAGAGAGATTTTAGGTCCAAGAAGCTATACCAGTAGGACA 
AGGCAGGAAAATACTACACTTTCAGGATCAAGCCCCTCTGACTCTCATTTGGAAAC 
TGGATGTTTGCTAAGCACCTGCTTCTTAAGGATGCCGAGGGATTTAATGATACTCC 
CAGAAACCTGGAGAGATTAATGGGGCCTATGGAGAAGTGCTCTGAACTCAGTGTTG 
GGACTTGAATAAAATTAACCATTGTCATGTTTTCAGAACAACTAAGCTGTTTTATA 
TTTCATGTGCATGAAAGCCCTAGAACTAAGTTGTGTTATTTCCAGAAATGAAATAG 
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ATCCCACAGTTAGATGATGTGGCCATTAGGAAGTACCAAATTTATAAAAATCACTG 
GAGGTCTGTCTGAGCAGTACCTAATAAAATATAGTATACTGAAAGTGAACAGATCT 
TTGTCTCTTTCTTTGGCTGCTTGATACTTTATCTGTGTCTGCCGGACAGTGC 

* 

Sequence ID 607 

CAATA?^AAGCAGGTTAA.CCTCAATGATAGCAGTTAAAATGTTCTATCTTATGTATT 
TCTTTTAAGTATTACCATTATGGTGCTACTGAGCGTTTTCTTTTGGTAAAAAGAAA 
AATGCCATGGGCTGCAGTCTTCTTCCATCACTTTTCCCTACCAGGTCCATTAATAT 
GCTTATAACACTAGTGCCAGTTATTTTATTTGATAATGCTTATGGTATTTGTATAT 
TTGTTTGCATTCCAATTTTGTTTAATAATGAGTGTGTAAACTGCATACGTTAAATA 
AATGTAAATACTAATGTACTGCTGC 

Sequence ID 609 

TTTTATTACCCAAGTTTTAACCTCTGTCTGGTGATTTGTTGTTGTTGTTGTTGTNG 
TTGTTGTTGAAGTTCAGGCTGCATGTGGGATAGGTTTGCTCAGGCATACTTCTTAG 
GAAGTAGTCACTTGCATGACTGTTTTTGGGATAACTCTTTGAGTATTTGGAGAGGT 
CTATTGTAACTTCTGAAAGGCATTGTTTTTACGTATGAATGTTCTAAAATTCATTC 
TAAATGGTCATGAAAAGAAAAGGATTCACATTTTAGAATGGCAATAGTCCCTGAGG 
ACTATTATGTCTTTTAGATTTCCTGTGGGTTTCTAGGAATGTTAGTGTAACTTANA 
TTTCCACCTACCTGATTTCTGGATGTGCCTATTGGAACTTGCTGAGATCTTTTTTT 
TTCCTTAACATGTTGTCCCCTTGACCCGTACTTCGAAACTAAACATATTATTTTAT 
TTGCTTACACTTCAGGAGGCAATTGGCAGACACCAGGCCAACAGTCT 

Sequence ID 610 

GCTCTGACCCCAGTTGGAAATGTATCTGTACTTTGTCCGGCTTCCACTCAAGGACC 
ATTTATGACATTGCTTGGTGTCAGCTGACAGGGGCTCTGGCCACAGCTTGTGGGGA 
TGACGCGATCCGCGTGTTTCAGGAGGATCCCAACTCGGATCCACAGCAGCCCACCT 
TCTCCCTGACAGCCCACTTGCATCAGGCCCATTCCCAGGATGTCAACTGTGTGGCC 
TGGAACCCCAAGGAGCCAGGGCTACTGGCCTCCTGCAGTGATGATGGGGAGGTGGC 
CTTCTGGAAGTATCAGCGGCCTGAAGGCCTCTGAGCTACCTCGACTTTGGACAGAG 
TAATGACTCCCCAGAAAACGTCATATAAGACTTTACCAGCCCCTGAGAGGACCAGG 
AGGAGCATCCTTGACCTTCATTTAACTTGGCTCACTTCTCTTCANACTTGGGTAGA 
AGTGCAGAGCCACAAAATTGCTTTCCTTCCCCGCCTTTGACATGAGGCCTTCAGTA 
AAG 

Sequence ID 611 

TGCAGGATCCGTCGACT . 



WO 2004/046382 



PCT/GB2003/005102 



- 188 - 

Sequence ID - 612 nt: 57£ _ 

GAGAAATATAAGATTATGTATAGATCAAATCTACCTCTATTTGGTGTCCTGAAAGA 

GATGAGGAGAATGGGACAAAOTTGGAAAGCTTATTTCAAGATAACATTCCTGAGAA 

CTTCCCCAATCtTGCTAGAGAGGCCAACATTAAAATTCAGTAAATGCTGAAAACTC 

CAGTAAGATATTTCTTAAGAAAATTATTCCCAAGATATATACTCATCAAATTATCT 

AAGGTCAAATGAAGGAAAAAATTTTATAGGCAGCTAGAGAGAAATGTCAGGTCACC 

TACAAAGAGAATGGC^TAAGACAAAAAGTAGAACTCCCAGCAGAAACTCTAAAAGC 

CAGAAGAGATTAGGGGCCAATATTTAACATTCTGAAAGAAATTCCAACAAGGAATT 

TCATATCCAGCCAAACTAAGOTTCATAATTGAAGGAGAAATAAGATATTTTCCAGA 

CAAGCAAATGCTGATGAAATCCATCACCACCAGACCTGCCTTATAAGAGCTCCTGA 

GGGAAGCACTAAATATTGAAAGGGAAGAACTTTATGAACCATTTCAAAAACACATT 
TAAGTNCACAAAGCAG 



Sequence ID - 613 nt . 3 

CCTTATTTTACAGGTGAAAAACCACGAAT(^GATAGATTTTTATTTGCCCAAGTCA 

CATAATATTAAGAACAGGCCAAGTGTGGTGGCTCATGTCTGTAATCTGAGCACTTT 

GGGAGGCTAAGGCGGGTGGATTTCCTGAGCCTAGGAGTTTGAGATCAGCCTGGGCA 

ACATGGCGAAACCTCATCTCTACAAAACATACAAAAATTAGTCAGTGTGGTGGTGA 

GAGCCTGTAGTCCTGGCTACTCGTGAGGCTGAGGTGGGAGCATCACCTGAGCCTGG 

GAAGTCGAGGCTGCAGTGGCAACAGAATGGGTAACCTGGACATCAGAGTGAGACCC 
TGTCT 

Sequence ID 614 

CTCACACCTGTAATTCCATTACTTTGGAAGGCTGAGAGAGGAGGATCAGTGGAGCC 

CAGGAGTTTGAGACC^GCCTGGGCAATATAGGGAGACCCTGTCTCTACAAAAATGA 

AATAGCCAGGCGAGGTGGCATGTGCCTGTGGTCCCAGCTACTTGGGAGACTGAGGT 

GGAAGGCTGCCTTGAGCCCAGGAGTTCCAGGCTGCAGTGAGCCATCATTATGCCAC 

TGCACTCCAACCTGGGAGACAGAGTGAGAGAGACCCTGTCTCAAACAAACAAACCC 

AAAATAGGCCAGGCACAGTGACTCATGCCTGTAATCCCAGCACTTTGGGAGGCTGA 

AATAGGCGGATCATTTGAGGTCAGGAGTTCAAATTCAAGACCAGCCCGGCCAACAT 

GGCAAAACCACATCTCTACTACAAATAAAAAATTAGTTGGGTGTGGNGGAGCATTC 

CTGTAATCACAGCTATTCAGGAGGCTGAGGCATGANAACCGCTTCA 

Sequence ID - 615 nt . 3?9 

TAAATTTAAAACATTTTAATTAGCTGGCATGATGGCATGCACCTGTAGTCCTACCT 
ACTTGGGAGGCCAAGGCAGGAAGATTGCTTGAGCCCAGGAGTTTGAGCTTACTGTG 
AGCTGTGATCACACCACTGCACTCCAGCCTGGGTGACAAAGGAAGACCGTATTTCT 
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AAAAAATAAAAAATACAAATACAACTACAAACTAGCACTAGACCAACAGTGACTAT 

GTACCATGAACTGAGGAATATTATTAATTCCACCATTTGCATCTGAGGTTAACAAT 

ATGTCAATGACTTAAATAACATCATATCTCTGAGAGTAATTTCTCCTATATTTCCA 
TGACAAATGTTAGATAATTTTCCATTTTTTCCATTCAACAAAA 

Sequence ID 617 

TTTTCAGGCATGTCAGAGAAGGGAGGACTCACTAGAATTAGCAAACAAAACCACCC 

TGACATCCTCCTTCAGGAACACGGGGAGCAGAGGCCAAAGCACTAAGGGGAGGGCG 

CATACCCGAGACGATTGTATGAAGAAAATATGGAGGAACTGTTACATGTTCGGTAC 

TAAGTCATTTTCAGGGGATTGAAAGACTATTGCTGGATTTCATGATGCTGACTGGC 

GTTAGCTGATTAACCCATGTAAATAGGCACTTAAATAGAAGCAGGAAAGGGAGACA 

AAGACTGGCTTCTGGACTTCCTCCCTGATCCCCACTCTTACTCATCACCTGCAGTG 

GCCAGAATTAGGGACTCAGAATCAAACCAGTGTAAGGCAGTGCTGGCTGCCATTGC 
CTGGTCACATTGAAATTGGTGGCTTCATT 

Sequence ID - 618 n t : 598 

GATTAACTTTCATTTTAAGCTCTTCTCTACTAATTCTGTTCGTATGTTTATTCATT 

TTGCGTTGATCATATTTTGTACACCAGGCACTCTTCTCAGTTTTATATGTGTGTTA 

ATTTACTCCTTTCAAGAGCCCTATGATACATGAATTTATCTCCATTTTATAGATGA 

GGAAATTAAGACCTAGAGTTACTGAACTTGCCCAAGGTTATACAGCTGATGGGTAG 

GGCCAGAACTTTGCCTCAGAGAATCTGAATTTCCAAAAAATAACCTAAAAGAGAAA 

TTTAAGTACTAATTAGTAAGCAAAGAAATGCACATTTAAGGAAGACAGTGCACATT 

TAAGGAAGACAGTAACCTTTTATCTATTAGAGAAAAACACACATTCTGTCTTTAAC 

ACACACATAAATCTTATATTGGCAGGGATTTTCTTTATTCAGCAATTATTTATTGG 

TTGTCTGCTTTGTGGTACACATAAATGCTGGGGATAAACACTTAATAAAATATACT 

TCCTTCTCTTGAATATCTTGCACTTTAAGTGGGAAGGTAAGTCAACAGAGTAGAGG 

TGATATATCCAAGTGATAGACTGTTTCATTGCCAGTAG 

Sequence ID 619 

GTTGCCTGAGAGTGACCTTTGCATCTGCCTGTCCAGCCAGCATGGAACCAAAGCGG 
ATCAGAGAGGGCTACCTTGTGAAGAAGGGGAGCGTGTTCAATACGTGGAAACCCAT 
GTGGGTTGTATTGTTAGAAGATGGAATTGAATTCTATAAGAAGAAAAGTGACAACA 
GCCCCAAAGGAATGATCCCGCTGAAAGGGAGCACTCTGACTAGCCCTTGTCAAGAC 
TTTGGCAAAAGGATGTTTGTGTTTAAGATCACTATGACCAAACAGCAGGACCACTT 
CTTCCAGGCAGCCTTCCTGGAGGAGAGAGATGCCTGGGTTCGGGATATCAATAAGG 
CCATTAAATGCATTGAAGGAGGCCAGAAATTTGCCAGGAAATCTACCAGGAGGTCC 
ATTCGACTGCCAGAAACCATTGACTTAGGTGCCTTATATTTGTCCATGAAAGACAC 



WO 2004/046382 



PCT/GB2003/005102 



- 190 - 

TGAAAAAGGAATAAAAGAACTGAAT 
Sequence ID 621 

TGGTACTGAACCTACGAGTACACCGACTACGGCGGACTAATCTTCAACTCCTACAT 
ACTTCCCCCATTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAATC 
GAGTAGTACTCCCGATTGAAGCCCCCATTCGTATAATAATTACATCACAAGACGTC 
TTGCACTCATGAGCTGTCCCCACATTAGGCTTAAAAACAGATGCAATTCCCGGACG 
TCTAAACCAAACCACTTTCACCGCTACACGACCGGGGGTATACTACGGTCAATGCT 
CTGAAATCTGTGGAGCAAACCACAGTTTCATGCCCATCGTCCTAGAATTAATTCCC 
GTAAAAATCTTTGAAATAGGGCCCGTATTTACCCTATAGCACCCCCTCTACCCCCT 

Sequence ID 622 

TTTTTCTTGTTTTTGTGTGTCTACCTTGGCATATACTAAAGGAAGGTGTGTATTCA 

TTTATTACATGATATCTCTGGGTTATAATTATTTACATATATGAATTTGAAAGAAA 

GATTGAGAGGGATATGTGTGACCTTTGTTTCATTATGATCATTTACATGACTAAAG 

ATAAAGATCATATGTCTGATTTTCAGTTTAATGGCAAGTTACTTAAAATAAATGAA 

ATATGTTTTTATTGTTTTCGTGGGTTTGATGCTTTGTGTTTTATTTCAAGTAACTT 

GAGAATGCATTGTGTTTGGTACTGTTTTTTATGAATATCATTAAAAATTTATTTAA 

GGAGAGAGTAATTTTGCAATAATATTTTTGATTTATTTGAAAATAAAATTCAAGAT 

AAATGAAATAATTGAAATTTTCTAAAGAAGGAATTGAATATATTTTTACATTTGAA 

TGAACTAAGGATTAACTGAACCATTTATATATAGTACTTTCAGAACTGAATGTCTT 

AAATGATAAAGCTCTAATTGGTTAAAGTGACTTTCTTTCAAGTCAAAGAACCCAGA 

AACTGAATAGATGATCTAACTACTGCCACTGAGGTTTTGGATTAGTGAGTATAAAT 
TT 



Sequence ID 624 
TGCAGGATCCGTCGACT 

Sequence ID 625 

GACAATCAGAGCAGATCTTGGGCTTCTGTGGCTCATCTCAGCCCTTTATAACTGGC 

CTGAGAAGAGGGTTTATCTACTTGTGCAAGTGGCCCAGAAATCTCACTCGTACATG 

AGGCTTTGGAACATCCTTGCAAAGGTACGCTGAAAGCAAATTGCTGTTTTCCTGGT 

GGTTCTGCACGTTTCCTAACTTTTATCATAGTTTGATTTTCATTATTTAAGAAAAA 

ATAAAAAATCCAAAGACCATAAGATGGCATTAGATTTTTTACCATTAAATTATTAA 

TGCCTATTTGGTGCTCATAAAGATTAATCATGTCACGCATGTTTCCAATCTTTCTT 

TTGCAGTATATTATTTTCTAAAAATTGTTACATGCAAATTTAAACCAAGATTTATC 
AGTA 
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Sequence ID 626 

TTGGAAGAAATAAACCAAGGCAGAAAAATTTTAAATGGCCAAAATAAATTGTATTG 
CTAACTTAGATGGCCACAGATGGGGGCAGGGGTGGAGAGAGGAGAAATTGAAAACN 

ccacaaagaccccgcaatggctagaActtgaaatctctggatattgcaacaatagc 

AGCCTCCTTAAGTCAGCAAAAAGATAAAGATTGATCCAATGTTCTATATTACAGAA 

cagagcagattgtcaatatagcaaataaagttaccgttgagtggactgcgctgtnt 

AAGCTGCTTGGTTGGCCTTAAGTGCCGACAATTAAGAGATGAAGGCAATGAGAACT 

gaaacaaacatttaagttcaagacccagtttactgacactgggactattactatat 

CTCTTTGGGCCTCAGTTTACTTATCTGTAACATTAAGAGGTTGGATTACATGATGT 

CTCACGATTCOTTTTTTTTATTTAGAGATGGGGTTTTGCTCTGTTGCCCAGGCTGG 
AGTGCAGTGGCATGATCATAGCTCACAGCAG 

Sequence ID 627 

CCAGCCTGTCACTGGCCTGGCCAAGGAGGAGAGACAGGCCAGGGATTCTGGTCCTA 

ACTCTACTGGCCACACTGTGTGGCCTGAGACCCCCCTTTCCCTCCCAAGCCCCTGC 

CTCCGCATCTGCGTGGTGAAGGCCATTGGCCCTCATCGGTGGATCTGCGTTTCCTC 

GGGCCTACACTGTCTAGGATTGTGCGGGGCTGGTGAGAGAACAAGATCTCTTCCGT 

GTTCAAGGCAGACTTCCTGCCCCCTGCACCCTGCTCTCTCCCAGGCCTTGAGGTCA 

GTGTGAGCCCCAAGGGCAAGAACACTTCTGGAAGGGAGAGTGGATTTGGCTGGGCC 

ATCTGGATGGAAGGTAAAAAAAAGAAAATCCCTTGAAAGGAGATTGAGGGAAGTTT 

Sequence ID - 628 nt . 419 

AAGAGAAAGGACTCAGTGTGTGATCCGGTTTCTTTTTGCTCGCCCCTGTTTTTTGT 

AGAATCTCTTCATGCTTGACATACCTACCAGTATTATTCCCGACGACACATATACA 

TATGAGAATATACCTTATTTATTTTTGTGTAGGTGTCTGCCTTCACAAATGTCATT 

GTCTACTCCTAGAAGAACCAAATACCTCAATTTTTGTTTTTGAGTACTGTACTATC 

CTGTAAATATATCTTAAGCAGGTrrGTTTTCAGCACTGATGGAAAATACCAGTGTT 

GGGTTTTTTTTTAGTTGCCAACAGTTGTATGTTTGCTGATTATTTATGACCTGAAA 

TAATATATTTCTTCTTCTAAGAAGACATTTTGTTACATAAGGATGACTTTTTTATA 
CAATGGGAATAAATTATGGCATTTTTT 

Sequence ID 629 

CTGAGAGTCACTGTGTTTTTAGCCAAATCTAAGGGAGAAAATGAATATTGATAGCA 
GCATGCTGTAGCCAGCTCCTTAAAGGAAGGATGGTGCCTGGTACAGAGTTAGAGTT 
AGTGCTTCAGTAAATAATGAATGTGTGCTAGGTAGGTTCTGCTGGGTAGGCTGCAT 
GCATTGACCAATTTATTCCTCCTTGTTTCAAAACAGGATTTAAGGGCACTTATATA 
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CTATATATGTATATATGTATATATATGTCTATATGTCTATATGTATATATGTCTAT 

ATGTATATATGTGTGTGTGTATATATATATATATATATATAAGTTTTCTGTTGCTA 

GCATAACAAACTACCAGAAACTTAGCAACTGAAACAACATGAATTTATCTTACGGT 

l-CTATAGTTCAGAAGTCTAACGTGTCACTGGGATGAAATCCAGGTTTCAACAGGAC 

TGGGTTCCCTTCTAGCTCATTCAGCTACCTGGCTCATTCAGGTTGTNGGCAGAATA 

TACTTCCATGAAACTGTAGGGCTGAGACCCCGTTCCTTCCTGGCTATCATCTGAAA 
ACTTTC 



Sequence ID 63 0 

AGGCGCAGCCCASCCTCGAAATGCAGAACGACGCCGGCGAGTTCGTGGACCTGTAC 

GTGCCGCGGAAATGCTCCGCTAGCAATCGCATCATCGGTGCCAAGGACCACGCATC 

CATCCAGATGAACGTGGCCGAGGTTGACAAGGTCACAGGCAGGTTTAATGGCCAGT 

TTAAAACTTATGCTATCTGCGGGGCCATTCGTAGGATGGGTGAGTCAGATGATTCC 

ATTCTCCGATTGGCCAAGGCCGATGGCATCGTCTCAAAGAACTTTTGACTGGAGAG 

AATCACAGATGTGGAATATTTGTCATAAATAAATAATGAAAACCTAAAAAAAAAAA 
AAAAAAAAAAAAAA 



Sequence ID 631 

TNCACTCACACACTCCCAAACCTTAACAAACACATACATGTGCAGCCAACCCAATG 
GGCCAGCCTCTTTTATGCTCCTCACATGTTTCCTTTAACTGGAATACCCATGACAG 
CTCCCTACATAGTTACTTGTAAACTCCTCCTCTCTGTATAAGTTTTCCTGAATTTT 
TTTGATAAAATTAAGTTGTGCCACCCCTTTATGCTCTCTTANAACTTTGTTCTGTT 
CTCATGGCTGTTCTGCAACGAATCTCATTGTGTTCTCCTACTCAATTACATTCCTG 
CGTCTCCCACTAGATGGCAGACTCTTTGAGAGTAGGAGATTCCCTTGTTATCTCTG 
GATCCCTGGCACTTGCAGAAAGCCTGTTACGTAATAATTGCTCAACAATTAGTTTT 
TAAATAAATGAATTATTTTTAAAACGCCAAAATTACAATGATTGTGCATTAAGTGA 
AAGATGACCATCTAAAAACATAAAGCCATGCTTCATGACATTGGC 



Sequence ID 632 

GACCATTCAGGGAAATTTTATAAAAAATGCAGATACTGTCTTGAGCAGATCGAAAT 
GCCGATGAGGTGGATGCAATTTCCTTTTGTGCAAGCAGTGCACGGTGCCCCCCCCT 
CGGGTGTCCGTGCTGTGCCTTAGCTTCCCCAGGTGCCGGGACTCACACCTGCTAGG 
GGCTGGGCAAGGCCCCGGCTCTGCTTTCTCTGAAGGGCTTGTCCAAGTTCATTGCC 
CTGTTACAGGTGGTCAAGACGTCCGGCCGCCTTGACCCAGGCTACCCTTAGCCAAT 
ATCCTCTGCCCCTGGGTGGTTGGTGGCTGGGCCTCAGGGTGGGCAACGTTAGGGGT 
TTGGCGAAAGCCCGCCCCATGGGATTGAGGGACGGGGCTGCACTCCAACCGTCTGC 
ACCTGCTCTTCCCCCACCCCTGTGGGACCTCATCTTCACGTGCCATGTGTGCTGAA 
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GGCCCAGGGCCCAGCAGGGGGCAGTGGCACCTGTTGACGGAAAAGCCGAGGTGCTT 

ACCAATGGACCTTCTGGCCCGCCCTCCCCTGTACTTGTCGGGCATTCAGGGCCCCG 
ACCTGTGCCTACCCGCA 

5 Sequence ID 633 

CAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTGACTCCTGAGG 
AGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGT 
GAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTC 
CTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTC 

10 ATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTC 
AAGGGC^CCTTTGCCAC^CTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCC 
TGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTG 
GCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCANAAAGTGGTGGCTGGTGTG 
GGCTAATGCCTGGCCCCACT^jAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTC 

1 5 TATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAA 
GGGCCTTG 

Sequence ID - 634 ' nt: 511 

TTTTTTAATTTCACCAAAATTTGTTGACGTCCCTTGATTTGCTGATAGGGACAATA 
2 0 ATTAAATATTTTCCACTTGTTTTTATAAAAACTGTAATGGTGATTTGTTTAACAGA 
TGTTGACTTAGCACCTTCTCTCTTTTTTTTTTTTTTTTTTTGAGTTGGAGTCTTGC 
TCTGTCACCCAGCTGGAGTGCAGTGGCACGATTTCGGCTCACTGCAAGCTCCGCCT 
CCCAGGTTCGGGCGCTTCTCCTGCCTCAGCCTCCCANATAGTTGGGATTACAGGTG 
CATGCCGCCACNCCTAGCTAATGTTTTTTGTATCTTGGTANANATGGNGTTTCACC 

2 5 TTGTTGCCCATGCCGCTCTTGAACTCCTTGGCCTCCCAAAGTGTTAGGATTACAGG 

CGTGAGCCACTGTGCCTGGCCCCAATTTANCACCTTACTGGGTGCTGAGGCTGTGA 
GCCATAGTAGAATGCATGTGATCCAGGGCCTTGCTGAATTCATGGGCTAATAGGGA 
GCCTGAC 

30 Sequence ID - 635 nt: 592 

TGAGCGTTGGGCTGTAGGTCGCTGTGCTGTGTGATCCCCCAGAGCCATGCCCGAGA 
TAGTGGATACCTGTTCGTTGGCCTCTCCGGCTTCCGTCTGCCGGACCAAGCACCTG 
CACCTGCGCTGCAGCGTCGACTTTACTCGCCGGACGCTGACCGGGACTGCTGCTCT 
CACGGTCCAGTCTCAGGAGGACAATCTGCGCAGCCTGGTTTTGGATAGAAAGGAC^ 

3 5 TTACAATAGAAAAAGTAGTGATCAATGGACAAGAAGTCAAATATGCTCTTGGAGAA 

AGACAAAGTTACAAGGGATCGCCAATGGAAATCTCTCTTCCTATCGCTTTGAGCAA 
AAATCAAGAAATTGTTATAGAAATTTCTTTTGAGACCTCTCCAAAATCTTCTGCTC 
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TCCAGTGGCTCACTCCTGAACAGACTTCTGGGAAGGAACACCCATATCTCTTTAGT 

CAGTGCCAGGCCATCCACTGCAGAGCAATCCTTCCTTGTCAGGACACTCCTTCTGN 

GAAATTAACCTATACTGCAGAGGTGTCTGTCCCTAAAGAACTGGTGGCACTTATGA 
GTGCTATTCGTGATGGAGAAACACCTGACCCA 

Sequence ID - 636 nt . 572 

CTTANAAGAGTTGCTCATTCACACCCACGCCCTTGCCCAAGGCTGGCCCACTCAGA 

GCGAAACTTAACTTTTGTCTGGATGGGAAGAGAAGTAAGTCTACCCCGAGGTTGCC 

ATGTTGAAGAGTGAGAGGTCCAAGTGATTCTGTGCATTGAAACCAAGACACCCCAC 

CCAGAACACTTCTTCCCTCCCTCAGCCCAAACCAAAGGCTGGGGTTCTCATCTCCA 

AGTGGCTGTTCTCCAACTTTCCCAAGCCGCTTGCATTCCCCAGACTGGACTACTGT 

GGCGGTTAGGTTAGATTTGAAGACGGGGCCCAGGCTGGGTATGAACGGGTGCAGCC 

CTCTTCTCCTCTTCCCCCCCACATCTCTCATGAGAGAGGTAGTGGCATTTCCTTCT 

CAGGGAGCTTCAATGGGAAAGGTCTCGAAAGCTTCAGGAGGAGCAGAATACCAACG 

CAGGGGGATGGCTGTAACGATCTCACCGTCTCCTAACCTCAGTCCCTTTTTTGAGA 

GTGAATGGTGGAGGGTGGGAAAGGGACCCAAATTTGTAGATCTCTTTGTCTGGGGG 
AGGGGAANGATG 

Sequence ID - 63 7 nt . 482 

TTAAAACAGGCGCAGGGGTAAAAATGAGAATGAATCTGAAAAAAGAGAGTTGGTGT 

TTAAAGAGGATGGACAAGAGTATGCTCAGGTAATCAAAATGTTGGGAAATGGACGA 

TTGGAAGCATTGTGTTTTGATGGTGTAAAGAGGTTATGCCATATCAGAGGGAAATT 

GAGAAAAAAGGTTTGGATAAATACATCAGACATTATATTGGTTGGTCTACGGGACT 

ATCAG^ATAACAAAGCTGATGTAATTTTAAAGTACAATGCAGATGAAGCTAGAAGC 

CTGAAGGCATATGGCGAGCTTCCAGAACATGCTAAAATCAATGAAACAGACACATT 

TGGTCCTGGAGATGATGATGAAATCCAGTTTGACGATATTGGAGATGATGATGAAG 

ACATTGATGATATCTAAATTGAACCAAGTGTTTTTACATGACAAGTTCTCTGAGGA 
TGGTTCTACAGTTGGGATTTTGGCCATCATCAAC 

Sequence ID - 638 n t : 545 

TTTGAAGGCAAAGAGGGATTAATCTGTGCTGGCATCATGTAAGGAGACTTGATAGA 
TAAGAAAAAGCTTTACCTAAGTTTTGAAGAATAGGTTTTTCATAATGGAAAATTTA 
AGGGAAAAATCTCCAAAAAAGTGCTACTCAAGTTTTATCCATTTGTATTTCCAACA 
CAGCCTAGGACAGTACCTGCACATAGTAGGTGATTAATAAAAATTTAGAAAGCATT 
AATACTAAAGAGGAAAAATAGCAATGGCAAGAAAACACATGTAGGGAACACATGTA 
GCCAAAAAATAATATATAATCAGAGAAATAATAGGACTTCTGGAAAAAAAAGATGA 
GATCAGATTGGTTAGGATCTTTACTAACATGACAAGAGCATGAATTTTTTTTCTGT 
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AGATAATAAGTATGAAAGAATTTTAGCTTAAAAATTAGCATAATTTGGATCCACAT 

ATGCAAATCAATGAA.TGTAATTCATAATATAAACAGAACTAAACACAAAAACCACG 
TGATTATCTCAATAGACACAGAAAAGGCCTTCAAAAAAATT 

# 

Sequence ID - 63 9 n t . 62 4 

GACACACGAGCATATTTCACCTCCGCTACCATAATCATCGCTATCCCCACCGGCGT 

CAAAGTATTTAGCTGACTCGCCACACTCCACGGAAGCAATATGAAATGATCTGCTG 

CAGTGCTCTGAGCCCTAGGATTCATCTTTCTTTTCACCGTAGGTGGCCTGACTGGC 

ATTGTATTAGCAAACTCATCACTAGACATCGTACTACACGACACGTACTACGTTGT 

AGCCCACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGCCATCATAGGAGGCT 

TCATTCACTGATTTCCCCTATTCTCAGGCTACACCCTAGACCAAACCTACGCCAAA 

ATCCATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCACAACACTT 

TCTCGGCCTATCCGGAATGCCCCGACGTTACTCGGACTACCCCGATGCATACACCA 

CATGAAACATCCTATCATCTGTAGGCTCATTCATTTCTCTAACAGCAGTAATATTA 

ATAATTTTCATGATTTGAGAAGCCTTCGCTTCGAAGCGAAAAGTCCTAATAGTAGA 

AGAACCCTCCATAAACCTGGAGTGACTATATGGATGCCCCCCACCCTACCACAGAT 
TCGAAGAA 

Sequence ID 641 

CAAGATGACAAAGAAAAGAAGGAACAATGGTCGTGCCAAAAAGGGCCGCGGCCACG 
TGCAGCCTATTCGCTGCACTAACTGTGCCCGATGCGTGCCCAAGGACAAGGCCATT 
AAGAAATTCGTCATTCGAAACATAGTGGAGGCCGCAGCAGTCAGGGACATTTCTGA 
AGCGAGCGTCTTCGATGCCTATGTGCTTCCCAAGCTGTATGTGAAGCTACATTACT 
GTGTGAGTTGTGCAATTCACAGCAAAGTAGTCAGGAATCGATCTCGTGAAGCCCGC 
AAGGACCGAACACCCCCACCCCGATTTAGACCTGCGGGTGCTGCCCCACGTCCCCC 
ACCAAAGCCCATGTAAGGAGCTGAGTTCTTAAAGACTGAAGACAGGCTATTCTCTG 
GAGAAAAATAAAATGGAAATTGTACTTAA 

Sequence ID 642 

TGCTTGGCCCTCTACCTCCTGCCCTCTTCCTGTTCATCTCCCAACCACTGCACTCT 
TGATTTTTATACCACACAGAAGGTAAGAAAATTCTAGGAACCCTAAGGATCAATCC 
TCTCCATTTTCACTCAAATGCCTGGGGCCCAGCTCTGCAATGACTGACTCCAGGGC 
CTCTTTCCTCACTGCCAGCATAGAAGTCAGGGGAGCCAGCTGGGCCCTGCGGTCAG 
GAAGGTTCTCATTTTTGGAGCATTCCCTGAGCCCAGATCATAGGAGCAGCTGTCCC 
TGGTGGGACACAGGAGTCATGACTCCTACCCTCCACCCTCCACACCCACCAGGCAT 
TTAGCAGTCTGTCCTATGCAAGACAGATGAATTCTCAGCCAGGATACCTCAAGGCA 
GGCAAAGGTGAGTGGAGGGAAAATTCACAAACATTCAGGGTGTGTGGTGCTGGCAT 
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CACCATGGCCAAATCCAAGAGGTCTTCCTGGAAGAGGGCCCAAACTGGAACCAAAA 
GAATGCTGTCAGCAGTTGGAATAGAGCTGTGAATT 

Sequence ID 643 

CTTTCCAAGAGGAATCCTCGGCAGATAAACTGGACTGTCCTCTACAGAAGGAAGCA 

CAAAAAGGGACAGTCGGAAGAAATTCAAAAGAAAAGAACCCGCCGAGCAGTCAAAT 

TCCAGAGGGCCATTACTGGTGCATCTCTTGCTGATATAATGGCCAAGAGGAATCAG 

AAACCTGAAGTTAGAAAGGCTCAACGAGAACAAGCTATCAGGGCTGCTAAGGAAGC 

AAAAAAGGCTAAGCAAGCATCTAAAAAGACTGCAATGGCTGCTGCTAAGGCACCTA 

CAAAGGCAGCACCTAAGCAAAAGATTGTGAAGCCTGTGAAAGTTTCAGCTCCCCGA 

GTTGGTGGAAAACGCTAAACTGGCAGATTAGATTTTTAAATAAAGATTGGATTATA 
ACTCT 

Sequence ID 644 

CTTTGATAGAGAAGAAAATTCTCCTAGGATACAAGAGCCTCAACATTTTAAAGATT 

TTCTGCATCTCAAAAGCGTAGGCTCCTTGCTGGGCAAGGTGAGCCTCTGTGAGTCC 

TCATAGGACCGAGCAAATCTGATTCACCCCAGAAAATCCAATATCGAAGCTGAGCT 

TTGGCCTGAGCGGGTTCCATTTCCTCCCCAGATCCTATTTAGGAAGTGTCTCCTGA 

CAACCTCCAAAAGGTGCTAACATGCAACGTTCTGAAGGGTTATTGCTCAAAAACAA 

GATTTTCCTTGTGGTCAAGACTCTGCGAGCCTCGAACACGATGAATCCGCTCGAAT 

GGGCTTGGGCTTTGCCCGGGTGGCGCACGCTCACACGCTGGAAGCACAGCTTTGAC 

GATCTCCACACACGCACAGGCACACACGCCACAGATGATGCCGGCTCATTCTCAGG 

GGGTGTCTAAGTTCTGCTTTAAATATTTACCCCCTAATTGTACAAACAATAGGGGC 
ATGAGCCTGGTACTCGATAAATGGGGACTTNCTTAAAA 

Sequence ID - 645 nt . 649 

CTACAGCCTGGGCAGCGCGCTGCGCCCCAGCACCAGCCGCAGCCTCTACGCCTCGT 

CCCCGGGCGGCGTGTATGCCACGCGCTCCTCTGCCGTGCGCCTGCGGAGCAGCGTG 

CCCGGGGTGCGGCTCCTGCAGGACTCGGTGGACTTCTCGCTGGCCGACGCCATCAA 

CACCGAGTTCAAGAACACCCGCACCAACGAGAAGGTGGAGCTGCAGGAGCTGAATG 

ACCGCTTCGCCAACTACATCGACAAGGTGCGCTTCCTGGAGCAGCAGAATAAGATC 

CTGCTGGCCGAGCTCGAGCAGCTCAAGGGCCAAGGCAAGTCGCGCCTGGGGGACCT 

CTACGAGGAGGAGATGCGGGAGCTGCGCCGGCAGGTGGACCAGCTAACCAACGACA 

AAGCCCGCGTCGAGGTGGAGCGCGACAACCTGGCCGAGGACATCATGCGCCTCCGG 

GAGAAATTGCAGGAGGAGATGCTTCAGAGAGAGGAAGCCGAAAACACCCTGCAATC 

TTTCAGACAGGAAATCCAGGAGCTGCAGGCTCAGATTCAGGAACAGCATGTCCAAA 
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TCGATGTGGATGTTTCCAAGCCTGACCTCACGGCTGCCTTGCGTGACGTACGTANC 
AATATGAAAGTGTGGCTGCCAAAAACCTTGCAG 

Sequence ID - 646 nt: 600 

GAGATGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTCTCTTTCTGGCCT 

GGAGGCTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATCCAGCAGAGA 

ATGGAAAGTCAAATTTCCTGAATTGCTATGTGTCTGGGTTTCATCCATCCGACATT 

GAAGTTGACTTACTGAAGAATGGAGAGAGAATTGAAAAAGTGGAGCATTCAGACTT 

GTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTACACTGAATTCACCCCCA 

CTGAAAAAGATGAGTATGCCTGCCGTGTGAACCATGTGACTTTGTCACAGCCCAAG 

ATAGTTAAGTGGGATCGAGACATGTAAGCAGCATCATGGAGGTTTGAAGATGCCGC 

ATTTGGATTGGATGAATTCCAAATTCTGCTTGCTTGCTTTTTAATATTGATATGCT 

TATACACTTACACTTTATGCACAAAATGTAGGGTTATAATAATGTTAACATGGACA 

TGATCTTCTTTATAATTCTACTTTGAGTGCTGTCTCCATGTTTGATGTATCTGAGC 

AGGGTGCTCCACAGGTAGCTCTAGGAGGGCTGGCAACTTA 

Sequence ID 647 

CGAATGTGCAGGTTTGTTACATAGGTATATATATGCCATGATGGAAATATTTATTT 
TTTTAAGCGTAATTTTGCCAAATAATAAAAACAGAAGGAAATTGAGATTAGAGGGA 
GGTGTTTAAAGAGAGGTTATAGAGTAGAAGATTTGATGCTGGAGAGGTTAAGGTGC 
AATAAGAATTTAGGGAGAAATGTTGTTCATTATTGGAGGGTAAATGATGTGGTGCC 
TGAGGTCTGTACGTTACCTCTTAA.CAATTTCTGTCCTTCAGATGGAAACTCTTTAA 
CTTCTCGTAAAAGTCATATACCTATATAATAAAGCTACTGATTTCCAAAAA 

Sequence ID 648 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 

Sequence ID - 649 n t : 425 

CAAAAAAACGAAGAAAAGTGACGACAGTCTGAGGGACTTATGGGAGATCATCAAGT 
GAACCACTATATGTGTAATGTAAGTCTTGGAA'TGAGAAGAGAGAAGGAGAAGGAGG 
AGAGAGCTTATTTGTAGAAATAATGGCTGAAAACATCCCAAACTTTCCTTTTTTTG 
AGGAAAGAAATAGGCATACAAGTTCAAGAAACTCAAGGAACTCCAGAGAGGACAAT 
TCTAAAGACACCCCCTCTAACATACATTATAATCAAATTGTCAAAAGTAAAATACA 
AAGAGAATCTTTTAAATTGACAAGAGAAAAGCAGCTGGTCACGTTCAAGGGAGTTC 
TATAAGAATTTCAGCAGATTTCTCAGCAGAAACCTTGCAGGCCAACAGGCAGTGGG 
ATGATACATTCAAAGTGCAAAAAAAAAAAAAAA 
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Sequence ID 650 

CGAGAGTTTACCAGTNGCCTAATAATGC^TAAAAAATGCTTTGAGATAGCTAACN 

GCCCATAAAACAAACTCAAATTGCTTATAAAGTTTCTTCCCATGTTCCCATTTGAT 

■GAAAAGTCTTACATCACATATAACTGGGAAGCAGGWTCCCTCCTCmTTTTCAGA 

CATTTTGAAAGGATGACAGTTCTGTTTGTTAGATGAGTAAACCTCTATATTCATAA 

GTTCTAAAATCCTTCATTATGAGGGATTCAAAGTATTTATAAAAACACTGCCCTCT 

AAAAATTTCCTCAGATCTGAAGTATGGNCTTGGNCCTGAATATACAGTGTTATCCT 

ATGTTTAAAAGGGTGATCCAGACATGAGACGCAACTAGTTGGTGCATAAGAAGGCC 

CCACTTGGCTATTTCATATCTACCTACAATTGACCAAAAAAAATTTTTTAGGCCAG 

CAATTATTATTTAGCTTCGCTCTTTCTAGTGCAAGAAACTGCAGGCTGGATCAGTA 

GTTCAACAGCTAAACAGTCATAAAATAGTCATTGGCATGTTAAATTTCTTTCAATG 

CTTCAAAGATAAATTCCAATTCTATTTACTTATTCATTGNGACNGNATTACTAAAC 
AGGTAAGGATGGGAATA 

Sequence ID - 651 nt . 251 

CTTTGGGAGGCCGAGGCGGGCGGATCACTTGAGGTCAGGGGTTCGAGACCAGTCTG 

GCCAACATGGTGAAACCCCAACTCTACTAAAAATACAAAAGTTAGCCAAGTGTGGT 

GGCAAGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGACAGGAGAATCACTTTGA 

ACCTGGGAGGCGGAGGTTGCAGTGAGCCAAGATCGTGCCACTGCACTTCAGCCTGG 
GCAACAGAGCAAGATTCCGTCCATCTC 

Sequence ID 652 

CTTTCTTCAGCCTTGC^GACACCTAAAC^TCATGTAATTACCTAAGGAATTCCCAA 

GTGCCTCTTCCAGGTTATACGTGTAAATAGCTGTTTTTATGCAAGATTAGTTAGAT 

ACTGCTCTTTACAGGATGAGTGGTGTTGTCTTTGGCTGGGGGGGNCTTAAATGTGT 

TTCTAATGTGTGTGTCAAATAATTACCTGTTAAACAGACTGCCAATCTGGCTGAAG 

CCAATGCTTCTGAAGAAGATAAAATTAAAGCAATGATGTCGCAATCTGGCCATGAA 

TACGACCCAATCAATTACATGAAGAAACCTCTAGGTCCACCACCTCCATCTTACAC 

GTGTTTCCGTTGTGGTAAACCTGGACATTATATTAAGAATTGCCCAACAAATGGGG 

ATAAAAACTTTGAATCTGGTCCTAGGATTAAAAAGAGCACTGGAATTCCCAGAAGT 

TTCATGATGGAAGTGAAAGATCCTAATATGAAAGGTGCAATGCTTACCAACACTGG 

AAAATATGCAATCCAACTATAGATGCAGAAGCATATGCAATTGGGAAGAAAGAGAA 
ACCTCCTTNTTACCAGAGAGCCATCTTNTTTCT 

Sequence ID 653 

GTTGTGACTCGTTGGCATGTGATCTGAAGTTCCTGCCCTGCAGCTGACGAGCCAGT 
GTTTCAATAATTAAAAAGAACTCAACTCACTGTCCTCGTGCCTTGAATTTGATCAT 
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TGCGCTTTGCATGTATGTATCACAATACCACATGTACCCCATAAATATGTACAAAG 
ATTATGTGTCAATAAAAAACAAAAATTAAAATCCCAATTTTTA 

Sequence ID 654 

GTTGCTAGTAGCGGCAGGAAGATGTCAGGCTCACTTTCCTCTGATTCCCGAAATGG 

GGGGAACCTCTAACCATAAAGGAATGGTAGAACAGTCCATTCCTCGGATCAGAGAA 

AAATGCAGACATGGTGTCACCTGGATTTTTTTCTGCCCATGAATGTTGCCAGTCAG 

TACCTGTCCTCCTTGTTTCTCTATTTTTGGTTATGAATGTTGGGGTTACCACCTGC 
ATTTAGGGGAAAATTGTGTTCTG 

Sequence ID 655 

GTCCCCGGGAATCGCGGCCGCGTCGACGGTTTATTTTCAGTGCTTGAAGATACATT 
CACAAATACTTGGTTTGGGAAGACACCGTTTAATTTTAAGTTAACTTGCATGTTGT 
AAATGCGTTTTATGTTTAAATAAAGAGGAAAATTTTTTGAAAAAAAAAAAAAAAAA 

AAAAAAAAAAAAAAATTTTT 
Sequence ID 656 

TAGAGGCCTGAATAGGTAGACAATGGCAGCAGCGTTTTTAATCACAGTCCTATTCA 
TGCCCTAATTCGGGAGTGATGATTAAAGGACATTAGAGGGAGCACTTTGACATCTG 
ATCCTTTGAACTGACGTCTGTGCAGGCTGCACTCCATAGAGCTCACTTGGCCAAAC 
TGATTTCCTTAAATAAAGTGCTGTGATTTCCAATGTAGGAAATATTACATTAGAGC 
CTATTGAAATGATTAGGAATTGAGGAGCTTTTCTTTAGGTGGGAATGTGGTGTATG 
CTGTATACTCACAAAAGTGAGATCATTAATATTGCATGTACTACTTTGAATATCAG 
GGACCACAGAGAAATAGCATGAGAAACGCCTTCCTGCAGTCATGCACTTAAAATGA 
ATATGAACAAAAATGTGGAACTCTGCTGTCATAGCTCTCCG 

Sequence ID 657 

GGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAAGTTTTTTT 
CTCTTTGAAAGATAGAGATTAATACAACTCTTAAAAAATATAGTCAATAGGTTACT 
AAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAGATTTTAAG 
AGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAAAAGGTTTCT 
AAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATGTATTTAAA 
AGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAGAAGGGCAAT 
GCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTA 



Sequence ID 658 
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GACCTTTGAGAAAATTAATTTAAATCCTAGAACTTTGGGTGAACCGAAGAAATTTA 
TAATATTTGTTTAGTTAATAACAGATAAAAAGGAAAGATTCAAGCCTATTGGATGA 
GAATTTGTACATTATTTTAGAGCTAATAATAATGGTTTTCAGTTTAGTGAGGATTT 
AAAAAATGTTTTTGAATCAAACTTTTTTTCTTTATAATCCTTTTTAACTAACTCAG 
5 GAAATAAGGTATTATGAAATCCACACACTGTTACCTCCTTAAAGTATGAGGATACT 
TCCCACTGTTTGGTCCACTAGTGGCTGATTATTTTGTTTGTGGATTATTTGTAATT 
TTCTTTTTAATTCTTCCTTAAAGAGCATGGCATTTGGAGTCACAGACCTATATTTG 
AATCCTGTCATTTACTAGCGTTTTGACCTTGAACAATTATGCTCAGAGTCTCAGTT 
TTTTCTTGTAAAGTGATGATGATACTACTTAACTCACAGGGTTGTAGTGAAGATCA 
1 0 AATGAGATCATGTCTGTANAACACCCTGCCCGGCACTCAATAAGTATTAATAGGAA 
CCCATATACCTC* 

Sequence ID 660 

TGTTTTTATTTTTTAAAAGGTATAAACACCAAAAAAAAAATTAACATTGTATGAAG 
1 5 ATGGAAAATAAGAAGATGCACTTTCTGTAACTTTGTCTAAGGATTTAAATTACTAA 
CTTATGAACTCCAATTTGAATTGAACTTAACTATCGGCTTTCTTACTGGTAAAATT 
ATATGGTTTATTTTAAATGCGTACATATTGACGAATGGCCTCTGAAAAAGCACATT 
TTAGATACTGAAATTGAAGGAAAGAAAATGCATCTTCAAACATTTTTTGGAATCTC 
ACCACATATACTTTGTTANATTTGTGTATTGTAGGGTGTTTGTTTTGTATTTTTGT 
2 0 . ATTGTATATGAACTTTTTTTAAATGTGACAGTTAAACACATCTTTAAAAGCATAGT 
CACAGACAAAAGCATACAGTATAAAAATTTCCTTGAAAACTCCTACAATATTATAT 
TTGGAGGCAGCTTCAGACTGTTTTATTGG 

Seqeunce ID 661 

2 5 CTCTGGCACACATTAGTTCCTCTTATATTACATTGATATAAGCAAGTCATATGGAT 

TTATCTGAGTGTAAGGAGAGCTGGAAAAAATAGTTTCTAGCAGGTCAGCCACCTCC 
CAGTGAGGGCTGCATACCATAGAAGGGGAGAATGAATTTTGGGAAAACAGGTAATT 
ATCTCTGTCACAGAAGGGGATGAAAAGTATGGTAGTTACNCAAGTTANACATCTGT 
ATGGAAAATACCACTTGGTTCTACAAATGNGG 

30 

Sequence ID - 663 nt : 627 

GCCTCCCGGGTTCAGGGATTTCTCCTGCCTCAGCCTCCTGAGTGGCTGCATTGCAG 
GCACCTGCCACCACGCCTTGCAAATTTTTGTGTTTTTAGTGGAGATGGGGTTTTGC 
CATGTTGGCCAGGCTGGTCTCGGACTCCTGACCTCAGGTGATCCGCCCGCCTCAGC 

3 5 CTCCCAGAGGGCTGGGATTACAGGCGTGAGCCACTGTGCCTGGCCCCAAGTTTTGC 

ATCTTTTAATGCCCTCTGAACAAATACATAGAGAAAACTCTCAGAACAATTAAAAC 
CTGCAGAGGAACAGTGTCCTCCATGTCTTAGGTTTCAAGTTTGCCTCTAAAATTCT 
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AATCCATATTTTTCTACTTCTCAGATAATTTATGTGTGTGTACTCTTCCTAGACGT 

ACAAGAGACTTTTTAATGCTAAATATTTGTCAGTGCTTAACAAAAACTCAATTTCA 

CATTACTCATATTGTTTTTGTTTTAATTGAATGTGAATTAAATTTTTATTAGTTAT 

' TTGATTTGGAATGTTATGTATGCCATTAACACTATTAGGGGAATCTCTAGCATTTC 

TGTATTTTTAAAGAATTTGATTCTTTTGTANATTCTGCCTGTGTGGCATTTTAAAC 
ATGTGTGACAT 

Sequence ID - 665 n t : 345 

ACCGGCGACATGGCCAAACGTACCAAGAAAGTCGGGATCGTCGGTAAATACGGGAC 

CCGCTATGGGGCGTCCCTCCGGAAAATGGTGAAGAAAATTGAAATCAGCCAGCACG 

CCAAGTACACTTGCTCTTTCTGTGGCAAAACCAAGATGAAGAGACGAGCTGTGGGG 

ATCTGGCACTGTGGTTCCTGCATGAAGACAGTGGCTGGCGGTGCCTGGACGTACAA 

TACCACTTCCGCTGTCACGGTAAAGTCCGCCATCAGAAGACTGAAGGAGTTGAAAG 

ACCAGTAGACGCTCCTCTACTCTTTGAGACATCACTGGCCTATAATAAATGGGTTA 
ATTTATGTA 



Sequence ID - 666 n t : 252 

ATAATTCAGAACTTCTTCATATGCTCGAGTCTCCAGAGTCACTCCGTTCTAAGGTT 
GATGAAGCTGTAGCTGTACTACAAGCCCACCAAGCTAAAGAGGCTGCCCAGAAAGC 
AGTTAACAGTGCCACCGGTGTTCCAACTGTTTAAAATTGATCAGGGACCATGAAAA 
GAAACTTGTGCTTCACCGAAGAAAAATATCTAAACATCGAAAAACTTAAATATTAT 
GGAAAAAAAACATTGCAAAATATAAAAT 

Sequence ID 669 

TTACTTTTAACCAGNGAAATTGACCTGCCCGTGAANAGGCGGGCNTGACACAGCAA 

GACGAGAAGACCCTATGGAGCTTTAATTTATTAATGCAAACGGTACCTAACAAACC 

CACAGGTCCTAAACTACCAAACCTGCATTAAAAATTTCGGTTGGGGCGACCTCGGA 

GCAGAACCCAACCTCCGAGCAGTACATGCTAAGACTTCACCAGTCAAAGCGAACTA 

CTATACTCAATTGATCCAATAACTTGACCAACGGAACAAGTTACCCTAGGGATAAC 
AGCGCAATCCTATT 

Sequence ID 670 

GGCTGATTCCTGAGCTATAAAAGCATAATTGCTTTATATTTTGGATCATTTTTTAC 
TGGGGGCGGACTTGGGGGGGGTTGCATACAAAGATAACATATATATCCAACTTTCT 
GAAATGAAATGTTTTTAGATTACTTTTTCAACTGTAAATAATGTACATTTAATGTC 
ACAAGAAAAAAATGTCTTCTGCAAATTTTCTAGTATAACAGAAATTTTTGTAGATG 
AAAAAAATCATTATGTTTAGAGGTCTAATGCTATGTTTTCATATTACAGAGTGAAT 



WO 2004/046382 



PCT/GB2003/005102 



- 202 - 

TTGTATTTAAACAAAAATTTAAATTTTGGAATCCTCTAAACATTTTTGTATCTTTA 
ATTGGTTTATTATTAAATAAATCATATAAAAATT 

Sequence ID 671 

CAGGAAGTCACCTGGGATTGGCTGCCTCACCCACTCACAGTGCCATCCCTGCCCCA 

GGCCTCCCAGTGGCAATTCCAAACCTGGGTCCCTCCCTGAGCTCTCTGCCTTCTGC 

TCTGTCTTTAATGCTACCAATGGGTATTGGGGATCGAGGGGTGATGTGTGGGTTAC 

CTGAAAGAAACTACACCCTACCTCCACCACCTTACCCTCACCTGGAGAGCAGTTAT 

TTCAGAACCATTCTACCTGGCATTTTATCTTATTTAGCTGACAGACCACCTCCACA 

GTACATCCACCCTAACTCTATAAATGTTGATGGTAATACAGCATTATCTATCACCA 
ATAACCCTTCAGCACTA 

Sequence ID 672 

CAGGAAGTCACCTGGGATTGGCTGCCTCACCCACTCACAGTGCCATCCCTGCCCCA 

GGCCTCCCAGTGGCAATTCCAAACCTGGGTCCCTCCCTGAGCTCTCTGCCTTCTGC 

TCTGTCTTTAATGCTACCAATGGGTATTGGGGATCGAGGGGTGATGTGTGGGTTAC 

CTGAAAGAAACTACACCCTACCTCCACCACCTTACCCTCACCTGGAGAGCAGTTAT 

TTCANAACCATTCTACCTGGCATTTTATCTTATTTAGCTGACAGACCACCTCCACA 

GTACATCCACCCTAACTCTATAAATGTTGATGGTAATACAGCATTATCTATCACCA 

ATAACCCTTCAGCACTAGATCCCTATCAGTCCAATGGAAATGTTGGATTANAACCA 

GGCATTGTTTCAATANACTCTCGCTCTGTGAACACACATGG 

Sequence ID 673 

GGGTTTTCTTTCGGAAGCGCGCCTTGTGTTGGTACCCGGGAATTCGCGGCCGCGTC 

GACTGCTAAACAGAATACTGCTATTTTGAGAGAGTCAAGACTCTTTCTTAAGGGCC 

AAGAAAGCCACNTGiraCCCTNGGNCTAATCTGGCTGAGTAGTCAGTTATAAAAGCC 

NTAATNGCTTNNTNTTTGGNNTCNTTTTT1^<^GGGGNCGGNCTTGGGGGGGGTTG 

CNTCCAAAGATANCATNTNTTTCCAACTTTNTNAANNNAANNGTTTTAAAATCCCT 

TTTCCNCCNGAAAANANNGCCCTTTAAGNGCCnSTCAAAAAAAAANNGTNTTCTGCAN 

NTTTTCTANTATNACAAANNTTTTNGTAGAANAAAAATTTTTTTTTAGNGGCTACC 

CTTTNTTTNTTANNCANMGGAGTTTNTTTTTACAAAAAAAAAANATTGGGNCCCCT 

CCACAACCTTGGGTCTNTAATNGGGGGGTTTTTAAATAAANCNTNTNTAAATCCCC 
CNNNNTONNNNCNI^^ 

TCCCCCNCCCTTTTTCTTCCTGCCGGCCCCAATTTAAGCCCNGGCGCTTGGGGCAA 

ATCCCCCTTTAGNGGGGGGGTTTANAAAAACCNGGGGCGGGGNTTTAAAACCNCGG 
GGNNNGGGGAA 
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Sequence ID 674 

ACCTCTAGCATCACCAGTATTAGAGGCACCGCCTGCCCAGTGACACATG-TTTAAC 

GGCCGCGGTACCCTAACCGTGCAAAGGTAGCATAATCACTTGTTCCTTAATTAGGG 

ACCTGTNTGAATGGCTCCACNAGGGTTCACTTGTCTCTTACTTTTAACCAGTGAAA 
TTGACCTGCC 

Sequence ID - 675 n t : 591 

GTATAGAAAATAATGTCCCCAGNGCATAGAAAAAATGAGTCTCTGGGCCAGTGAAT 
ACAAAACATCATGTCGAGAATCATTGGAAGATATACAGAGTTCGTATTTCAGCTTT 
GTTTATCCTTCCTGTTAAGAGCCTCTGAGTTTTTAGTTTTAAAAGGATGAAAAGCT 
TATGCAACATGCTCAGCAGGAGCTTCATCAACGATATATGTCAGATCTAAAGGTAT 
ATTTTCATTCTGTAATTATGTTACATAAAAGCAATGTAAATCAGAATAAATATGTT 
AGACCAGAATAAAATTAATTATATTCTGGTCTTCAAAGGACACACAGAACAGATAT 
CAGCAGAATCACTTAATACTTCATAGAACAAAAATCACTCAAAACCTGTTTATAAC 
CAAAGAATTCATGAAAAAGAAAGCCTTTGCCATTTGTCTTAGAAAGTTATTTTTTA 
AAAAAAAATCATACTTACTATTAGTATCTATGGAAGTATATGTAACAATTTTTATG 

TAAAGGTCATCTTTCTGTGATAGTGAAAAAATATGTCTTTACT^AGTTGAAATGAA 
TACTTTCTGNCTTTGCTAATGGATAGTTATT 

Sequence ID 676 

CTCAATTCTACTAAAAAGCCCCCCAAGAAAAGCGAATGAGAAAACAGAGTCATCCT 
CTGCACAGCAAGTAGCAGTGTCACGCCTTAGCGCTTCCAGCTCCAGCTCAGATTCC 
AGCTCCTCCTCTTCCTCGTCGTCGTCTTCAGACACCAGTGATTCAGACTCAGGCTA 
AGGGGTCAGGCCAGATGGGGCAGGAAGGCTNCGCAGGACCGGACCCCTAGACCACC 
CTGCCCCACCTGCCCCTTCCCCCTTTGCTGTGACACTTCTTCATCTCACCCCCCCC 
TGCCCCCCTCTAGGAGAGCTGGCTCTGCAGTGGGGGAGGGATGCAGGGA 

Sequence ID 679 

GNANCNTTTCCTNTCGNAAANCGCGCCTTGTGTTGGTACCCGGGAATTCGCGGCCG 
CGTCGACAAAAAAAAAAAAAAAAAAAAAAAAAAAAANTNTAGACTCGANCAAGCTT 
ATGCANGCNTGCGGCCGCAATTCGAGCTCGGCCGACTTGGCCAATTCGCCCTATAG 
NGAGTCGTATTACAATTCACTGGCCGTCGTTTTACAACGTCGNGACTGGGAAAACC 
CTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGT 
AATANCGAANAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGG 
CGAANGGAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTA 
AATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAA 
AAGAATAGACCGAGATAGGGTTGAGNGTTGTTCCAGTTTGGAACAANAGTCCACTN 
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TTAAAGAACGNGGACTCCAACGTCAAAGGGCGAAAAA^ 



CCGTCTATCAGGGCGATGG 
GTTTTTTGGGGTCGAGGNGCCGT A a , _ 

cACT^Tcao^cccr^GAaccccca.TTT^oc^ccaS^c 



CCCACTACGTGAACCATCNCCCTAATCAAi 
CACTAAATCGGAACCC 

GGCGAACGTGGCGAAA 



Sequence ID 682 

CACCTGCAGTCCAAGTACATCGGCACGGGCCACGCCGACACCACCAAGTGGGAGTG 

•GCTGGTGAACCAACACCGCGACTCGTACTGCTCCTACATGGGCCACTTCGACCTTC 

TCAACTACTTCGCCATTGCGGAGAATGAGAGCAAAGCGCGAGTCCGCTTCAACTTG 

ATGGAAAAGATG^TTCAGCCTTGTGGACCGCCAGCCGACAAGCCCGAGGAAAACTG 
AAACTTTGCTTAACNACCGAATGGNGGGGAKCITTTCC^CGOTTTT ^ 

Sequence ID 683 

TTGGTTTC^TACTGNTGGGGNTTGAATGNTCCCTNCAACACTNATGTTGANACTTA 
ATCCCTAATGNGGCAATACTGAAAGGTGGGGCCTTTGAGATGTGATTGGATCGTAA 

GGTGGCTTTATAAGAAGAGGAAAAGAGAACTGAGCTTGCATGCCC 
Sequence ID - 684 

nt: 545 

GTGGAAGNGACATCGTCTTTAAACCCTGCGTGGCAATCCCTGACGCACCGCCGTGA 

TGCCCANGGAAGACAGGGCGACCTGGAAGTCCAACTACTTCCTTAAGATCATCCAA 

CTATTGGATGATTATCCGAAATGTTTCATTGTGGGAGCAGACAATGTGGGCTCCAA 

GCAGATGCAGCAGATCCGCATGTCCCTTCNCGGGAAGGCTGTGGTGCTGATGGGCA 

AGAACACGATGATGCGCAAGGCCATCCGAGGGCACCTGGAAAACAACCCAGCTCTG 

GAGAAACTGCTGCCTCATATCCGGGGGAATGTGGGCTTTGTGTTCACCAAGGtAGKSA 

CCTCACTGANATCAGGGACATGTTGCTGGCCAATAAGGTGCCAGCTGCTGCCCGTG 

CTGGTGCCATTGCCCCATGTGAAGTCACTGTGCCAGCCCAGAACACTGGTCTCGGG 

CCCGATAAGACCTCCTTTTTCCAGGCTTTAGGTATCACCACTAAAATCTCCAGGGG 
CACCATTGAAATCCTGAGTGATGTGCACTGATCAAGACTGG 

Sequence ID 685 
GGAAAGGGCCATTT' 



ATGTCCTTTTTTGAATAGCTGTTCTAATTATTATATATTi 



TATTGCCTAAAACCACCTGGNTTTTNAGGTAACAGTTCCAAC 

CAGCTGATTAATAGGAG 

TACTTGATAGGTGGACTGTGTCAGGTAGCCTCAGGCAATCCTACTTCAACAAGCTG 

TCAGGGAGCCATGCCATGCTTCTTTATGACATAGGTGAATTTGATAGGCTCACTAG 
CAGAACATGGGATCACAAGGTGGAACCNTTCCNTTT 
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Sequence ID 686 
GACCCCTTCCTTACACCTTAT^ 

TTATACAAAAATTAACTCAATTTTATTATGTTGTATTAAATTAAGTTGGGTTTAAT 
TAAGATGGATTAAAGACTTAATTATAAGACCTAAAACCATAAAAACCCTAGAAGAA 
5 AACCTAGGCCATACCATTCAGGACACGGGTATGGGCAAAGACTTCATAACTAAAAC 
ACCAAAAGCAATGGCAACGAAGTCCAAATAGACAAATTGGACCTGATTAAACTAAA 
GAGCTTCAGCACAGCAGAAGAGACTATCGTCAGAGTGAACAGGCAACCCACAGAAT 
GGAAGAAAATTCTTGCAATCTATCCATCTGACAAGGGGCTAATATCCAAAATCTAC 
AAAGAACTTAAACAAATTTACAAGGAAAAACACAAACAACCCCATCA^ 
1 0 CTAAGGATGTGAACAGACACTTCTCAAAAGAAAACATTTATGCAGCCAACAAACAT 
GAAAAAAAGTTCATCATCACTGCTCATTAGAGACATGCT^ 
GATCCCATCCCACACCAGTTAGAATGGGAATCATTAAAAATGT 

Sequence ID - 687 nt : 268 

15 TTTATGTGTTTTTGCTTGGGGGGCGCTGGGCCTAGCCCAGAGTAGTGCTTGCTCCC 
CCTGCCTTGTCCCACCAGGGAGGCAGCAGACTCAGGCCCTCCATGGTCCTCTTTGT 
CATTTTGTTGACATGCATTCCTCCTTTTGTCATCTTGTTGGGGGGAGGGGATTAAC 
CAAAGGCCACCCTGACTTTGTTTTTGTGGACACACAATAAAAGCCCCGTTTATTTG 
TAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 

20 

Sequence ID - 688 nt : 569 

CTTTAGCCAGCCTGATCAGAAAAAAACAAAAGAAGAGGAAAGACGTAGATTACCAA 
CATCAAGAATGTGAGTTATGATATCACTACAGACTCTCCAGGTATTAAAAGCATAA 
TTAGAGAATGATATGAGCAGCTATATGCAAATAAGTTCAACATTGGACAAATGGAC 

2 5 AAATTTCTTGAAAGATAAATTATGAAATTTCATTCTGAAAGAACTACATGACCTTA 

ATTGTCTTACATCTATTAAATAAGTGGAAATTGTAGTTTAGAAACTTTCCCACAAA 
GAAAACTCTAGGCCCAGATGGCATCAAAATAATATTCAGATGAATGAAATGGAGAA 
AGGATAGCCTTTTCAACAAATGGTGGTGGAAC^TTGGATTTCCATATGCAAAAAA 
ATAGAGATGGACGCAGAGGTGTGTGCTTAGGAGGCTGAGGTGAGAGGATTGTTTGA 

3 0 GGCCAGCCTGGGCAACATAGCAAGACCCCATTTCAAAAACAAAAATAAAGAACTTG 

TAGCCTTACCTTGTGCCATATTATGAAAATGTATCATAGGCTTAAATGTGAT^AGGT 
AAAACAAAA 

Sequence ID 689 

3 5 CGCAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAAGTTT 
TTTTCTCTTTGAAAGATAGAGATTAATACAACTACTTAAAAAATATAGTCAATAGG 
TTACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAGATT 



WO 2004/046382 



PCT/GB2003/005102 



- 206 - 

TTAAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAAAAGG 

TTTCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATGTAT 

TTAAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACTAATAGAAGG 

GCAATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTTAAA 

AGTTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAACCGA 
AGGTGATTAAAAGACCTTGAAATCCATGACGCAGGGAGAATTGC 

Sequence ID 690 

CGAAAAGCAAATATAACTTGCCACTAACCAAGATCACCTCTGCAAAAAGAAATGAA 

AACAACTTTTGGCAGGATTCTGTTTCATCTGACAGAATTCAGAAGCAGGAAAAAAA 

GCCTTTTAAAAATACCGAGAACATTAAAAATTCGCATTTGAAGAAATCAGCATTTC 

TAACTGAAGTGAGCCAAAAGGAAAATTATGCTGGGGCAAAGTTTAGTGATCCACCT 

TCTCCTAGTGTTCTTCCAAAGCCTCCTAGTCACTGGATGGGAAGCACTGTTGAAAA 

TTCCAACCAAAACAGGGAGCTGATGGCAGTACACTTAAAAACGCTCCTCAAAGTTC 
AAACTTAGATTTCAGATTT 

Sequence ID 691 

CCGGTCTCTACACAATATATAGAAATCTGGGCATGGTGGTGCCTGGCTGTAGTCTC 

AGCTACCTAGTTGGGTGAGGTGGGAGAGTCGCTTGAGTCCTGGAGGTTGAGGCTGT 

AGTGAGCCAGGGCTGCACCACTGCATTCCAGCCTGGGTAACAGAGTGAGACCCTGT 

CTCAAAAAGAAAAAAAAAAATTGCTAATTTTAACAAATCACAAAACTGACTCAGGC 

AAGTTGTCTGACTCAAAAGCCCTTGAAAAACCATCAAAGACAGTAGAATGTTAACT 

GGTCATTTACGTAAAATAGTGTTCATTAAATTTTTGGTTCATTTAGGATAATCATT 

TTAAATGAGACTGTATTTGAGACTGTATACACATACATATACATGTTTACACACAT 

ATACGTACAATATATGTACATTCTATCTAAAAGATCATACATGTGTGTACATATAT 

GTTTTTAAAAGTCAAACTGACATATTAATGGAAACAGTGCTTACATCTCTGGTAGT 

GATTTTCTATTAGCAGCAGCCCTACATATGCTGCGTCTCTGAACAGCATGTCAGTG 

CCATGACTGTCTAAACATGCAAATATGACTGACAGACTCTTGAGACAGCTTTCACC 
TTG 

Sequence ID 692 

AATTCGNGGCCGCGTCNNCCTANGAGGCACCAGGAAATCCCGCGGGGTGGCCCATG 
CAGACCAGGCGCACGTGGCTCATGGGGCANAATTGCCAAGGACAGCTCACGACAGT 
GCCACCTTCTCACCATTCCAGCCAAGGAGAGATGTGACGTTGGAACTGCTCTGGCA 
CTTCTGTCAAGCCTCCCCCGCCCCAATTGCCTTGAGATCTCTGCTCTTTGTCAGAG 
ATTTGCAAAGACTCACGTTTTTGTTGTTTTCTCATCATTCCATTGTGATACTAAGA 
AACTAAGAAGCTTAATGAAAAGAAATAAAATGCCTATGTTGTTGTTCT 
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Sequence ID 693 

CTAGAACCCATGACTCCTAGGTCTTATACTGCAACCACAGTATCAGCAAATAATCT 

TTCATAAGGGGATTATTCTCTGATTAACAGGAAATACAGGAATTTAATTTGTGAAC 

ACGCTAGGTAGAAGCAGAAACCCAAATCCAAATCCAAATTTAAACATTTAAAA^ 

ATTCTATAACTAAGATCTAACAGTCATTTTCTTCCCAGTAAGAAATAACCAAAGCA 

TGCTAAAAATCACTGGACTAAATTGGTGTCAAAACTGCCACATTGCCAGGCATGGG 

GGGGTCATACTTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGAAAATTGCTTGA 

GGCCAGGAGTTCGAAACCAGCCTGGGCAACACAGTGAGACCCCATCTCCACLAAAAA 

AAAAAAATTAAAAAACAAAACAAAAGATTAGCTGGGCM 

GTCCCAGCTACTCAGGAGCCTGAAGTGAGAGGATCACTGAAGCCCAGGAGGTAGAG 

CTATGACTGTAGTGAGCTATGACTGTGCCACTAGACTCCACCTGGGTGACAGGGGA 
CTC 

Sequence ID 694 

CGACTTCCATTTGTATTAATGGAATACTAAGTCCCTCTGTGATTTCTGAACCAAGC 
TATTCCTAGGCCTGAGTTTTATTTTGTTGACACAGAAATAAATTANAAGGCCAAGC 
GTGGTGGCATGTGCCTGTAGTCCTAGTTGCTGAGGTAAGAGGATTGCTTGAGCCCA 
GGAGTTCAAGGCTGCAGCAAGCTTTGATTGCGCCACTGCACTCCAGCCTTGGCGAC 
AGACTAAGACGCTGTCTCAAAAASlAAACAAAAA 

Sequence ID 696 

GGTTATCAATGAGATTAAGAGACAACTAGAGTAAAAACAAAAGAAAAGAAAAGAAA 

NGAAAACAACAGAAGCTCTATTAACTGACCTCTAACCAATACAACAGGTTAACTGA 

TGTTCTCCATTCTGTATATAAAAATCCCAGTGGACACCCACAACACAGGCTTCAGG 

CTTGTAGGACACTTTCTAGTTCATCTGAGCACTTTTGTTCTCAGCAGTT 

ATACTTAGCAACATTTGGTGCTTCCAAACCCATTTGTGCCTGTAGCACTTACTATT 

GAAATACATAATTTAATTAAATATTATATAAAGGAATGGAATACGAGTTGGACAAG 

AAAAAGAGTTAAATCTGAAGGTTAGGTAAAAAGAGCAACTTCTTTTCTCTGTTTTG 

CAGGTTGGCAAAATCATTTAAAAACAATTGGAAGTATTATATGTTCTGCATTAAGT 

TGTCATTTTACTTAAAAACTAGGCATCAAAGATGATGCATAATAAATTTAGTGTAT 

GCAAGAATGACTGCTTGGGACCTCAATATATGAATTCTTAATCCAAGGAAAGTCCT 

TGGCCTTACATTTAAAAGTCGGCAAATAAGTGTACGTTCATT 

Sequence ID 697 

GAACATTTAAAAATAATGCAAATAAGGCTGGGCGTGGGGGCTCACACCTGTAATCC 
CAGCACTTTGGGAGGCCGAGGCAGGCAGATCACGAGGTCAGGAGATTGAGACCATC 
CTGGCTAACACAGTGAAACCCTGTCTCTACTTAAAAAATAAAAAAATTAGCCAGGC 
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GTGGTGGTGGGCGCCTGTAGTCCCAGCTACTCAGGAGGCTGAGGCAGGAGAATGGT 

GTGAACCCGGGAGGCGGAGCTTGCANTGAGCTGAGATCGTGCCACTGCACTCCAGC 
CTGAGCGACAGAGCGAGACTCTGTCT 

c 

Sequence ID 698 

TCATTAGAATCCAAGCTTTGAAAATTTCTGATTAATGCTCATGTATTTCTTTATCT 

TTGTTTTTCCTTGTGAAGAAAGACTTTCACCACTGTCTGAGTGATGATGCTGTTGA 

TAAGGATGATGTCGATGACTACTATATTGCATCTCTCAGGAACAGCTGATGGGAAG 

GGAGGGGCTGCTGAGTTCCCTTGTTCTAGCTAGCAGCACGCTCCTCANAGAGGGGG 

CCGAGTTACAGACAGCAGCCGCATTCTCATGCAAAATTAGTTTTAAACTGCTAGTG 

TGGGCATCGGTACCTTTTGCCTGGGTGATACCGAAGAATTGTTGAGGATTTAGTAT 

GCTCCGTAGAGACAGTTCAGCCAGTCATTTCTGCATTGGAGAGACTTCTCATACTT 
TCTTTGAAGACTCATAGAAAGCTGGAT 

Sequence ID 699 

ATTAAGGTTTGTNCCCAACAAGAATAGATGTAATTAGAAAAAANTGNCTTCCTTAC 

CTATTGCCTCTGATNTTTACTTGCTTAAATTTTTTTTATTGNAAATCCAGAAAAAG 

NGGATTTAGAGAACAACACTAACTCCCACCTAATCTATGACAGANATGTACAANAN 

AGTACCTGTGAAAAATGTGAAAGNATNTGAAAAATGTAACCTTTGGCAGCCTGAGC 

ATAGTCAACCAGAAAAACTATCTGAATTAAAATAATTGGTCCATAGGTACTATTTT 

ATTTGGTCCATAAGGATTATTTTTTCAACTTTTTTTTCAAGTGTATTATTATGTCA 

TTTCCCACGTAGGTTACTGATACCTGAAGACTTTTTNCACCTTTAACCTTNCTCGT 

TGAGGAGCTTTGTANTCTAATAAAAGAGAAATATAAGTAAATGTTAGATATATGGG 

NGGATAATGGTAACTATGTGCTTAAAGAGGTATAAAAGAAGGGTAGGGAGCAGATA 

AGACAAAGGAAGGGCTATATTATAANGAAGAATATTCCAAGTAGGGAAGAGAAAAA 

GATATGTTATCCATATAATATTTTATGTGCAGTAGAGAACATGTTCTATAGAANAG 
ACAGAAGATG 

Sequence ID 700 

CTTGAGCCCAGGAATTCCAGCCTGGGCAATATAGTAAGACTCCGTCTCTACAAAAG 
ATACAAAAATTAGCCAGATGTGGTGGTGCGTGCCTGTAGTTCCAGATACTGGAAAG 
ACTGAGGCAGGAGGATTGCTTGAGCATGGGAAGTTGAGGCTGCAATGAGCTGTGAT 
TACGCCACTACACTCCAGCCTGGGCAACAGAGTAAGATCTTGTCTCAAAAAAAAAA 
TTGAATTCAGCTAAAAATAATAAAATTTTAAAATAATTTTAAAAAGCCCTCAACAG 
CTTTGTTTTTCTCTCCTTGCCAGCTTCTCTGCAGCCTATAGCCTGCAGGCTGGCTG 
CTGCGAGCCAGGACAAGCGGTGGGAAATGCAATCACAGCGTGAAATCTCTGTGTTC 
AGAGACACGCAGGAAGCAGGTGAACCATGAAGGGCCAACACATGCCCCCAGTTAGC 
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AGGGTGTAGAGACCGGGGCAGGGCTTTCTTCTTCCTTCTGGGTTATAAATATCCAT 

GTCCTGCCATTTGAAGCTGCAAGTGGCACACATGGATGCTGGACAGGCGCTCGCAC 

TTTCTGGGCAGGGCANGGGGCTCAAAGGCAGGACAGCTGGGCAAAAGCACCTTGCG 
TGGGCCC 

Sequence ID - 701 nt . 57Q 

CTTTGGAGCTTCTGTCTGTGCTGTGGACCTCAATGCAGATGGCTTCTCAGATCTGC 

TCGTGGGAGCACCCATGCAGAGCACCATCAGAGAGGAAGGAAGAGTGTTTGTGTAC 

ATCAACTCTGGCTCGGGAGCAGTAATGAATGCAATGGAAACAAACCTCGTTGGAAG 

TGACAAATATGCTGCAAGATTTGGGGAATCTATAGTTAATCTTGGCGACATTGACA 

ATGATGGCTTTGAAGGTAATTAAAATTATCAAATTGGTGCTTGATTTCTGCTTTTA 

AAATGGTTTATGGAAGAAAATATGATTAAAGTTTTGTATTGTTTTCCTTCCTATAG 

AAGATGGAGCCAGAATGGCATGCTAAGTTTTTTCTTTTCTTTAGTGTTATATATGA 

CTTCTCCTCAATTGTCACCCATTGATCTTTACCACTGTTAATAATGGATGATATTC 

AAAATACCTTATTTCAGTGATTCTAAGGCACCATTGATTAGAAACTGCATTATTAT 

TTATGTGTCCCTAAAAGCTACCTATTAAGCTGTTACACCCACCATTTTTCTGTTAA 
GAAAATCCTGATTTCAGAA 

Sequence ID 702 

GTNNTCCTCTCGGAACGCGCCTTNTGTAGCCAGGTGCTACCAGACCNAATACACGG 

TTGTTCCAGCTTGCGCATTCACCGATGGCGTAGATATCCGGATCGGAAGTCTGGCA 

GGAATCATTAATGACAATACCCGCACGCGGAGCAACGTCCAGACCACACTGGGTTG 

CCAGCTTATCGCGCGGACGGATACCGGTAGAGAAGACGATAAAGTCGACTTCCAGT 

TCGCTGCCGTCGGCAAAACGCATGGTTTTACGCGCTTCAACACCTTCCTGCACAAT 

CTCAAGGGTGTTTTTGCTGGTGTGAACGCGCACGCCCATACTTTCGATTTTGCGAC 

GCAGCTGCTCGCCGCCCATCTGATCAAGCTGTTCTGCCATCAGCATAGGGGCAAAT 
TCGATAAC 

GTGGGTTTCAATACCTAAGTTTTTCAGCGCGCCTGCGGCTTCCAGACCTAACAGGC 

CGCAATTCGAGCTCGGCCGACTTGGCCAATTCGCCCTATAGTGAGTCGTATTACAA 

TTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAAC 

TTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAAGAGGC 

CCGCACCCGATCGCCCTTTCCAACAGTTGCGCACCTGAATGGCGAATGGAAATTGT 
AAGCGTTAATATTTTGTTAAAATTCGCGT 

Sequence ID 703 

CTGCGCAGACCAGACTTCGCTCGTACTCGTGCGCCTCGCTTCGCTTTTCCTCCGCA 
ACCATGTCTGACAAACCCGATATGGCTGAGATCGAGAAAT'TCGATAAGTCGAAACT 
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GAAGAAGACAGAGACGCAAGAGAAAAATCCACTGCCTTCCAAAGAAACGATTGAAC 

AGGAGAAGCAAGCAGGCGAATCGTAATGAGGCGTGCGCCGCCAATATGCACTGTAC 

ATTCCACAAGCATTGCCTTCTTATTTTACTTCTTTTAGCTGTTTAACTTTGTAAGA 

TGCAAAGAGGTTGGATCAAGTTTAAATGACTGTGCTGCCCCTTTCACATCAAAGAA 

CTACTGACAACGAAGGCCGCGCCTGCCTTTCCCATCTGTCTATCTATCTGGCTGGC 

AGGGAAGGAAAGAACTTGCATGTTGGTGAAGGAAGAAGTGGGGTGGAAGAAGTGGG 
GGTGGGACGACAGTGAAATCTAA 

Sequence ID 704 

CTTGTATTCAAGAACTACTGTAATGCATTAGTGGTCTGGCTTCATTTTGTATGATG 

CCAGATCCTTAATTTACCCAGCACAATCATTTCAGTAGTTTCCTATGGCTCCTGCA 

AAAATGCAAACAGAAACCACCACAGGAACAGCCCCTTGCTGCCTCCTGTTGCTGAG 

GTAGTAGTCGCTAAAGAAAATTGAAGGCTCCTTACAATCTATATTTGAAAACTAGA 

ACTTCTGTAGAAACACACAGATCCCGATCTTAGAAGTTGTACAGGACAATCTGGTA 

AAACTGACATAATTGTGATTTATTAACATGAATTAAAATGCCCAACCAGTGCTTCA 

GTGTGACAGTATATTTAAAATAAAAAAGAAATTAAAGGTCATATACTGTACTACTT 

TCACAAAGATCCACAGTTTTGCAAAAGACTTGTCATATGTACAATGCTATATATCA 

AATGAGAAAAGCTGTAAGCAATTATATACGCAAAAGAAATGGCAGTA 

Sequence ID 705 

TTCCAGTCCTTTCATTTAGTATAAAAGAAATACTGAACAAGCCAGTGGGATGGAAT 

TGAAAGAACTAATCATGAGGACTCTGTCCTGACACAGGTCCTCAAAGCTAGCAGAG 

ATACGCAGACATTGTGGCATCTGGGTAGAAGAATACTGTATTGTGTGTGCAGTGCA 

CAGTGTGTGGTGTGTGCACACTCATTCCTTCTGCTCTTGGGCACAGGCAGTGGGTG 

TAGAGGTAACCAGTAGCTTTGAGAAGCTACATGTAGCTCACCAGTGGTTTTCTCTA 

AGGAATCACAAAGGTAAACTACCCAACCACATGCCACGTAATATTTCAGCCATTCA 

GAGGAAACTGTTTTCTCTTTATTTGCTTATATGTTAATATGGTTTTTAAATTGGTA - 

ACTTTTATATAGTATGGTAACAGTATGTTAATACACACATACATATGCACACATGC 

TTTGGGTCCTTCCATAATACTTTTATATTTGTAAATCAATGTTTTTGGAGCAATCC 
CAAGTTTAAGGGAAATATTTTTGTAAA 

Sequence ID - 706 nt . 496 

CAACCCTCTCTCCTCAGCGCTTCTTCTTTCTTGGTTTGATCCTGACTGCTGTCATG 
GCGTGCCCTCTGGAGAAGGCCCTGGATGTGATGGTGTCCACCTTCCACAAGTACTC 
GGGCAAAGAGGGTGACAAGTTCAAGCTCAACAAGTCAGAACTAAAGGAGCTGCTGA 
CCCGGGAGCTGCCCAGCTTCTTGGGGAAAAGGACAGATGAAGCTGCTTTCCAGAAG 
CTGATGAGCAACTTGGACAGCAACAGGGACAACGAGGTGGACTTCCAAGAGTACTG 
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TGTCTTCCTGTCCTGCATCGCCATGATGTGTAACGAATTCTTTGAAGGCTTCCCAG 
ATAAGCAGCCCAGGAAGAAATGAAAACTCCTCTGATGTGGTTGGGGGGTCTGCCAG 



ACACGTGCTTGATGCTGAGCAAAGTTCAATAAAGATTTTGGGAAGTTT 



Sequence ID - 707 nt : 397 

CGGATGTGGTGGCAGGCGCCTCTAGTCCCAGCTACTCGGCAGGCTGAGGTAGGAGA 

ATGGCTTGAACCCAGGAGGTGGAGCTGACAGTGAGCCGAGATCGCGCCACTGCACT 

CC^GCCTGGGCGGCAGAGCGAGACTCCATCTCIAAAAAAAAAAAAAAAAAAAATAGA 

CTTTGAGACCAGCCTGACCAACATAGTGAAACCCGTCACTACTAAAAATACAAAAA 

TTACCCGGGCGTGGTGACGGGCGCCTGTAATCCCAGCTACTTGGGAGGCTGAGACA 

GGAGAATCACTTGAACCAGGGAGGCGGAGGTTGTAGTGAACTGAAATCGTGCCCCT 

GCACTCCAGCCTGGGTAACAAGAGCGAAACTCCGTCTCAAAAATAAATAAATAAAT 
AAAAT 



Sequence ID - 708 n t : 293 

CCAGCTTTTTATGGTGTTTAATCTAATACACTTAAGCTGCAGTCCCAAAATTAGGG 

GTCCTTCAGTCTTGGAGACTATAAGGGAGCCTCTGCACCCAGGGAAAATGTTACCC 

TTTACAGGGGGGAAGGGTAAACCAGTAGGGAATACAGTACAATCCCAACCCTACTG 

GGAGGGGCGGGAGGGAGGTGTTGCCGTCACTGTATTAAGTCGATGTTGGGAAACGT 

TTTAACATCTGGAGCCTTTGTGGGTGGAAATATGTCTCCAGTTACAACTCCGCAGT 
GGATGTGAAGAAG 



Sequence ID 709 

GGAAGCTACAATGATTTTGGGAATTACAACAATCAGTCTTCAAATTTTGGACCCAT 

GAAGGGAGGAAATTTTGGAGGCAGAAGCTCTGGCCCCTATGGCGGTGGAGGCCAAT 

ACTTTGCAAAACCACGAAACCAAGGTGGCTATGGCGGTTCCAGCAGCAGCAGTAGC 

TATGGCAGTGGCAGAAGATTTTAATTAGGAAACAAAGCTTANCAGGAGAGGAGAGC 

CAGAGAAGTGACAGGGAAGCTACAGGTTACAACAGATTTGTGAACTCAGCCAAGCA 

CAGTGGTGGCAGGGCCTAGCTGCTACAAAGAAGACATGTTTTAGACAAATACTCAT 

GTGTATGGGCAAAAAACTCGAGGACTGTATTTGTGACTAATTGTATAACAGGTTAT 

TTTAGTTTCTGTTCTGTGGAAAGTGTAAAGCATTCCAACAAAGGGGTTTTAATGTA 
NATT 

Sequence ID 710 

TGGATTCCCGTCGTAACTTAAAGGGAAACTTTCACAATGTCCGGAGCCCTTGATGT 
CCTGCAAATGAAGGAGGAGGATGTCCTTAAGTTCCTTGCAGCAGGAAGCCACTTAG 
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GTGGCACCAATCTTGACTTCCAGATGGAACAGTACATCTATAAAAGGAAAAGTGAT 

GGCATCTATATCATAAATCTCAAGAGGACCTGGGAGAAGCTTCTGCTGGCAGCTCG 

TGCAATTGTTGCCATTGAAAACCCTGCTGATGTCAGTGTTATATCCTCCAGGAATA 

CTGGCCAGAGGGCTGTGCTGAAGTTTGCTGCTGCCACTGGAGCCACTCCAATTGCT 

GGCCGCTTCACTCCTGGAACCTTCACTAACCAGATCCAGGCAGCCTTCCGGGAGCC 

ACGGCTTCTTGTGGTTACTGACCCCAGGGCTGACCACCAGCCTCTCACGGAGGCAT 
CTTATGTTAACCTACCTACCATTGCCCTGTGT 

Sequence ID - 711 nfc . ^ Q 

GTGGTACATATACACAAAGGAAAACTATGTAGCCATTAAAAGAAAAGGAACTCCTA 

TCATTTGTAACAACATAAATAAATCTGGAGGAGATTAGGCTAAGGTGAAATAAGCC 

AGGCACAAAAAGACAACTACCATATGATCTTACTTATACGTGTGTGGAATCTAAAA 

AGGTGGAATTTACAGAAGCAGAGAGTAGAATGGTGATTACCAGAGGCTGGGGAGTG 

AGGGCAGGAGGTTGGAGAAATGTTGGTCAAAGGATACAAAGTTTCAGTTATACAGG 

ATGAATAAGTTCAAGAGATCTATTGTACAACGTGGTGGCTATAGTTGATAACAATG 

TATTGTGTTCTTGAAAAATGCTGAGAGAGTAGATTTTAAGTGTTCTCACCACAAAA 

CATAAGTATGTGAGGTAATGCATGTGTTAATTANCTTAATTTAGACATTTCATAAT 

GTATTATACATATTTCAAAACCACGTTGTACATGAGAAAGATACACAATT 

Sequence ID 713 

GCCCAGTCGACCCATGTTCTCCTTTCTACACCAGCATTAGACGCTGTCTTCACAGA 

TTTGGAAATCCTGGCTGCCATTTTTGCAGCTGCCATCCATGACGTTGATCATCCTG 

GAGTCTCCAATCAGTTTCTCATCAACACAAATTCAGAACTTGCTTTGATGTATAAT 

GATGAATCTGTGTTGGAAAATCATCACCTTGCTGTGGGTTTCAAACTGCTGCAAGA 

AGAACACTGTGACATCTTCATGAATCTCACCAAGAAGCAGCGTCAGACACTCAGGA 

AGATGGTTATTGACATGGTGTTAGCAACTGATATGTCTAAACATATGAGCCTGCTG 

GCAGACCTGAAGACAATGGTAGAAACGAAGAAAGTTACAAGTTCAGGCGTTCTTCT 

CCTAGACAACTATACCCGATCGCATTCAGGTCCTTCGCAACATGGTCACTGTGCAG 
ACCTGAGCAACCCCACCAAGTCCTTG 

Sequence ID 714 

CTGTAACAGAGATTCCTTTTTTCAATAATCTTAATTCAAAAGCATTATTAGACTTG 
AAAGGGTTTGATAATCTCCCAGTCCTTAGTAAAGATTGAGAGAGGCTGGAGCAGTT 
TTCAGTTTTAAATGAGTCTGCAGTTAATATCAAATGTGAGTTTGGGACTGCCTGGC 
AACATTTATATTTCTTATTCAGAACCCTTGATGAGACTATTTTTAAACATACTAGT 
CTGCTGATAGAAAGCACTATACATCCTATTGTTTCTTTCTTTCCAAAATCAGCCTT 
CTGTCTGTAACAAAAATGTACtTTATAGAGATGGAGGAAAAGGTCTAATACTACAT 
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AGCCTTAAGTGTTTCTGTCATTGTTCAAGTGTATTTTCTGTAACAGAAACATATTT 

GGAATGTTTTTCTTTTCCCCTTATAAATTGTAATTCCTGAAATACTGCTGCTTTAA 

AAAGTCCCACTGTCAGATTATATTATCTAACAATTGAATATTGNAAATATACTTGG 
CTTACCTCTCAATAAAAGGGTCTTTTCTATT 

Sequence ID 717 

TCCACCCACCTTGACCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACCTCGCCC 

AGCCCGATACTAGGACTTATGCAGAAAAAACCTTGACATGGAGGAAAGTAAGATCT 

AAATAAATACTGTATTCATAGATTAAAAGACTCAGCATAATAAATATACCATTTCT 

CCCCAGATTGATGTACAGATTTAACACAATTCCTATCAAGATCCCAGCAAGATTTT 

TGTAGATATGTAAAAGATTATTCAAAAATGTAAAAGGAAGGACAAAGGACTAGAAT 

AGATAAAACAAAATGGAGAAAGATTTAATAGGAATCACTGTAACTGATTTTAAGAC 

ATACAGAACAATAATAGAAACTGCTTGTATTAGTCCATTTTCACGCTGCTGATAAA 

GACATACCTGAGATTGGCAATTACAAAGGAAAGANGTTTATTGGCTTACAGTTCCC 
ATGGCTGGGGAGGCCT 

Sequence ID 718 

CTCCTCTGGGTTGAAACCCGGGCGCCGCCAAGATGCCGGCTTACCACTCTTCTCTC 

ATGGATCCTGATACCAAACTCATCGGAAACATGGCACTGTTGCCTATCAGAAGTCA 

ATTCAAAGGACCTGCCCCCAGAGAGACAAAAGATACAGATATTGTGGATGAAGCCA 

TCTATTACTTCAAGGCCAATGTCTTCTTCAAAAACTATGAAATTAAGAATGAAGCT 

GATAGGACCTTGATATATATAACTCTCTACATTTCTGAATGTCTGAAGAAACTGCA 

AAAGTGCAATTCCAAAAGCCAAGGTGAGAAAGAAATGTATACGCTGGGAATCACTA 

ATTTTCCCATTCCTGGAGAGCCTGGTTTTCCACTTAACGCAATTTATGCCAAACCT 

GCAAACAAACAGGAAGATGAAGTGATGAGAGCCTATTTACAACAGCTAAGGCAAGA 

GACTGGACTGAGACTTTGTGAGAAAAGTTTTCGACCCTCAGAATGATAAACCCAGC 

AAGTGGNGGGCTTGCTTTGTGAAGAGACAGTTCATGAACAANAGTCTTTCAGGACC 

TGGACAGTGAAGGGAGCCCGGGCAGCCA 

Sequence ID 719 

CGNGGCCGCGTNAACTTTTGATCGTCAGCTGGGGCTGGCAGGCACCTAAATGGGAA 
GGGTGATAGCAGTGTGTTGGGGGGAGTTTAGGGAACGGTCCTCTACCGATAGAGGC 
AGCANCTCATTGGAATTTCCTCCTGAAGTTGTCTTGCCCCTTGAATCCTGCAGGAA 
GGCTGGCAAATGGCCATTTCCCTTCCACTTGAATAGAGACCCATAACTCAAGTATC 
TGCCCTTAAGACACCACAGGACTGTTCTTCGCGGGCCCTGCCCCTGGATTTGGGAG 
AGGCAGTCCANCTCACCCAACTAGGCTCTGCANGGGGACCANGAGGGATGGGTTGT 
GTCCACAGGACCAGCCAGACTGATGAGGGATGCGGCAAGCATATTCTCACCACCTT 
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CTTTCACGTTTACAACANACCAGCNTTCCCTGTGTGGCAGGGGTTACATTGGTCAC 
CGAGGACCTANAATCATGGAGTGCTCTGGGGATCCGGGCTTGGA 

Sequence ID 720 

TCAGTGTTGAATTTTGTCAGACACTTTCTCTGCATCAATTGGTATGACCATGTGAT 

TTTTTTTCTGTAGCCTGTTAATATGGTTAATTTTCAAATATTGAGCTGATTAATTT 

TCAAATATTGAGCTCTCCTTGCATCTCTGGAATAAGTACCACTTGGTCGTGGTATA 

TATTTCTTTTAATATATTGCTGAATTCTGTTTGATCATGTTTTCTTAAAGACTTTC 

GTGTCTGTTTTCATGATAGATACTGGTCTATAGTTTTGTTGTAATATCTTGGTTTG 

ATTTTGATATCAGGATAATGCTACCTTAATAGAATGAATTGGAGCCAAGTATGGTG 

GCAAATGCCTATAGTCCTAGCTACTCAGGAGGCTGAGGTGGTGGGGACTGCTTGAC 
CCANGAGTTCAAATCTAGCTTGGGCAATGTAGCAAGAC 

Sequence ID 721 

TAGAAGGAATGACTATTCATGTCCAAAGTGAATGGTTTTGTGCAGTGAACAACACA 

TGGCGAGGTACTAACTGAGAAACTTTTTCATGCTTTATGCCTACCTCTTGTAGTTG 

TTGCAGAGCAAATATAAATTGTAATAAGATAGCTAGGCCTTGCAGAAACAAACAGA 

AAAACTTAAAAAAAAATGATATAAGAGCTGGAGTCTAGTATTTATATGAATCTGTG 

AGAGATAATTTTTTTGGTCTCACTGCAATGAACCAAAAGCGGCTGAGTTTGGTTTT 

TAATTGTAGCCATGTATTGAAGGCATCTTTTTGACCAACTCTTGTTGGTTCTGTCT 

TGAACCATTGTTAATCACTGTGCTGTAATTAGTATAGCTAAATCTTTTCCTTCCTT 

GCTCCTCCCCCAGCCCACCCCGTCTTCCCTTAACATTTTTTCAGGGGGGGTTGGGA 

GTGGTTTCATTTTAATGTGAGTGGATGTTTTGATAGTTGTAAGGAAAAAATGCATT 

TCAGACACATTTCACACATGAGCTATTTTCTTACACAGTATGTCTTATTGGTAATA 
AGAATGTAATTCAT 

Sequence ID 722 

CNTTCCNTAAGAATACAAAAAATTAGCTGGGCGTGGTGGCAGGCGCCTGTAATCCC 
ATCTACTCAGGAAGCTGAGGCTGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGC 
AGTGAGCAGAGATCACGCCACTGCAGTCCAGCCTGGGCAACAGTGCGAGACTCTGT 
CTCAAAAAAAAAATAAATAAATTACCTGGGTGTGGCAGCGCGTGCCTGTAATCCCA 
GCTACCCAGGAGGCTGAGGCAAGAGAACTGCTTGAACCCAGGAGGCAGAGGTTGCA 
TGGAGCTGAGATGGCGCCACTGCACTCCAGTCTGGTGACAGAGTGAG 



Sequence ID 724 

CTCTCTACTAAAAATACAAAAATTAGCTGGGCACGGNGGTGCATGCCTGTAAACCC 

agctaccaggtactcgggaggctgaggcaggagAatcgcttgaaccagggagtcgg 
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AGGTTGCGGCGAGCTGAGATCATGCCACTGCACTGCGGCCTGGAGACAAGAGCAAG 
ACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAA 

NGNGNGGACCTTATTTGGCTNTTAATTCAAACTATTAAAAATGTGAACN 

Sequence ID - 726' n t : 260 

CGGGGTCTGTACCGGGCTGGCCTGTGCCTATCACCTCTTATGCACACCTCCCACCC 
CCTGTATTCCCACCCCTGGACTGGTGGCCCCTGCCTTGGGGAAGGTCTCCCCATGT 
GCCTGCACCAGGAGACAGACAGAGAAGGCAGCAGGCGGCCTTTGTTGCTCAGCAAG 
GGGCTCTGCCCTCCCTCCTTCCTTCTTGCTTCTCATAGCCCCGGTGTGCGGTGCAT 
ACACCCCCACCTCCTGCAATAAAATAGTAGCATCGG 

Sequence ID 727 

CTGAGTNTAGAAATGATGCCATTAATACTGATTGCAAAAACATTACAACTCAGTAC 
TGCAGCTTTCATTCAAATAGGTTATATGTATAAACTGAGTTCAACAATATTGTATT 
TGAGATGGTAAAGTTAAAGAAATGCAATAATGTAAATAATACTTAAGAAAATAAGA 
TCTCAGGAAACTGTATATACTCTGTACTTTTATGCAACTTTATCAGATCATTTCAG 
TATATGCATCAAGGATATAGTGTATATGACATGAACTTTGAGTGCAAAAACTGTAC 
TATGTACCTTTTGTTTATTTTGCTGTCAACATCTAAATAAAGGTTTTTTTGTTTGT 
TTTTTGTTTTTTTAATTGTTTTGTTTTAAAGATTGTTTTAATTAATTAAAAAATTA 
ATTGTTTTAATTAAACAATTGTTTAATTGTTTTAAAGTCGCCAGGCTGAGGCAGGT 
GAATCACAAGCTTAGGAGTTGGAGGCTAGCCTGCCAACATGGTGAAACCCCGTCTC 
TACTAAAAATACAAAAAAATTAACTGGGTGTGGG 

Sequence ID 728 
CCCATCTGCACCAGTACAC^ 

GTGGCAACTTGGGATTCATTCTGGTGATTCTGAACCTTGCCTCATAGCTTAAAGTA 
TAAAAAAGATTCAAGAGCAGTGAGGTTTGTTCTTTCCAGTGAATGGTGGACTGAGT 
GGTGCGAGGTGGAGGGCTAACAAGAGGAAAGAACTACATTCTTCAGAATACAGTGA 
TGAAAATTCATTTTGAAACTCAAATATTTTCATTTTGGATATTCTCCTGTTTTTAT 
TAAACCAGTGATTACACCTGGCCATCCCTCTAAATGTTCTAGGAAGGCATGTCTAT 
TGTGATTTTGATGAAGACAGAATTATTTTTCTCTGTAGAAACACAGATACCACTTT 
ATCAGGGGAAGTTAGTCT^AATGAAATGGAAATTGGTAAATGGACAAAAGCTAGCTA 
GTAAAAAGGACGACCCAGCAACATGCTTTAACCCCATTGTATGTTTGTGGAAAGAG 
CATAGTTTAACATCTTGAGAAATTTGGGACATAAAAGTTTTCATNGGTAGACAGTT 
CATGGCAGTATATGAATTGACATAATGGAAATAATCTGATTTTATTTTTACAACTA 
ACATCCTTTCCCC 
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Sequence ID - 736 nt: ^ 

GGAATTCCAAGTGCTTGGGGATAATGATACCTCTGACCTTTCTTCCTTTTGGGAAG 

TACTTGAGTGTGCAGCTGCATGAGGCCTCAGCAGGAGAGAGATTTTAGGTCCAAGA 

AGCTATACCAGTAGGACAAGGCAGGAAAATACTACACTTTCAGGATCAAGCCCCTC 

TGACTCTCATTTGGAAACTGGATGTTTGCTAAGCACCTGCTTCTTAAGGATGCCGA 

GGGATTTAATGATACTCCCAGAAACCTGGAGAGATTAATGGGGCCTATGGAGAAGT 

GCTCTGAACTCAGTGTTGGGACTTGAATAAAATTAACCATTGTCATGTTTTCAGAA 

CAACTAAGCTGTTTTATATTTCATGTGCATGAAAGCCCTAGAACTAAGTTGTGTTA 

TTTCCAGAAATGAAATAGATCCCACAGTTAGATGATGTGGCCATTAGGAAGTACCA 

AATTTATAAAAATCACTGGAGGTCTGTCTGAGCAGTACCTAATAAAATATAGTATA 

CTGAAAGTGAACAGATACTTTGTCTCTTTCTTTGGCTGCTTGATCTTTATCTGTGT 

CTGCCGTACAGTGCACCCTTAAAGTATTCTACACCAGTGCTTCTCAAACTGGAAAT 
GTGCATGTAAGTCACCCANGGGTCT 

Sequence ID 739 

TGCATGCCCATAGTCCCAGCTATTTGGGAGGCTGAGGCAGGAAAATCGCTTGAACC 

CGGGAGCCAGAGGTTGCAGTGAGCCGAGATCGCACTCCAGCTTGGCGACAGAACAA 

GACTCTGTCTCAAAAAAAAAAAAAAAAGAAATCTTGGGATCCTGAACCCCTTACTC 

GAAGGGCTAAGGTAGCATCTCAGCATGTCTTATTCGAGACTTCGTANAACCAGACC 

TGCTGTTTGTAGATGTTAATTAATCAAACCTTTCTCTACTCATTCTGGACCAGTTA 

AGGTTTTCTCCTTCTCCGTATGAGTTTTGATTTTCGTCCTCCTTGGTTGGAGATCA 

CACTTTGGTCTGCTGCTAAGTTGGATGCCTCCCACTGTCTTTCCCTAAGTCTAGGG 

CTTCANACCCCAGTGTGGGGAGAGGGACTTTCGTTTCCTGCCCCTCACCACATCAG 

ACACAGGCAGGCAAGAATAAGATGGCCAAAAGGCCGATGAACTTCTTGACCTAGCC 

TGGGACATTACCTGTTACTAGGTGGACTTCACTGCCTGTGAATGGAAGCTGAAGGG 

CTGTTTTTTTGGTTTGTATTTGGACAGGCCAGGCTTANAGAGGGAGAGAACTGGGC 
TACTCTTCAGCAGTGATCTTTAAAATGCC 

Sequence ID 747 

CAGAGTGCAAGACGATGACTTGCAAAATGTCGCAGCTGGAACGCAACATAGAGACC 
ATCATCAACACCTTCCACCAATACTCTGTGAAGCTGGGGCACCCAGACACCCTGAA 
CCAGGGGGAATTCAAAGAGCTGGTGCGAAAAGATCTGCAAAATTTTCTCAAGAAGG 
AGAATAAGAATGAAAAGGTCATAGAACACATCATGGAGGACCTGGACACAAATGCA 
GACAAGCAGCTGAGCTTCGAGGAGTTCATCATGCTGATGGCGAGGCTAACCTGGGC 
CTCCCACGAGAAGATGCACGAGGGTGACGAGGGCCCTGGCCACCACCATAAGCCAG 
GCCTCGGGGAGGGCACCCCCTAAGACCACAGTGGCCAAGATCACAGTGGCCACGGC - 
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CACGGCCACAGTCATGGTGGCCACGGCCACAGCCACTAATCAGGAGGCCAGGCCAC 

CCTGCCTCTACCCAACCAGGGCCCCGGGGCCTGTTATGTCAAACTGTCTTGGCTGT 
GGGGCTAGGGGCTGGGGCCAAATAAAGTCTCTTTCCTC 

Sequence ID - 757 nt . 583 

GAACCCTGCGGAGGGACTTCAATCACATCAATGTAGAACTCAGCCTTCTTGGAAAG 

AAAAAAAAGAGGCTCCGGGTTGACAAATGGTGGGGTAACAGAAAGGAACTGGCTAC 

CGTTCGGACTATTTGTAGTCATGTACAGAACATGATCAAGGGTGTTACACTGGGCT 

TCCGTTACAAGATGAGGTCTGTGTATGCTCACTTCCCCATCAACGTTGTTATCCAG 

GAGAATGGGTCTCTTGTTGAAATCCGAAATTTCTTGGGTGAAAAATACATCCGCAG 

GGTTCGGATGAGACCAGGTGTTGCTTGTTCAGTATCTCAAGCCCAGAAAGATGAAT 

TAATCCTTGAAGGAAATGACATTGAGCTTGTTTCAAATTCAGCGGCTTTGATTCAG 

CAAGCCACAACAGTTAAAAACAAGGATATCAGGAAATTTTTGGATGGTATCTATGT 

CTCTGAAAAAGGAACTGT.TCAGCAGGCTGATGAATAAGATCTAAGAGTTACCTGGC 

TACAGAAAGAAGATGCCAGATGACACTTAAGACCTACTTGTGATATTTAAATGATG 
CAATAAAAGACCTATTGATTTGG 



Sequence ID - 758 n t : 424 

CTTGGCTCCTGTGGAGGCCTGCTGGGAACGGGACTTCTAAAAGGAACTATGTCTGG 

AAGGCTGTGGTCCAAGGCCATTTTTGCTGGCTATAAGCGGGGTCTCCGG7AACCAAA 

GGGAGCACACAGCTCTTCTTAAAATTGAAGGTGTTTACGCCCGAGATGAAACAGAA 

TTCTATTTGGGCAAGAGATGCGGTTATGTATATAAAGCAAAGAACAACACAGTCAC 

TCCTGGCGGCAAACCAAACAAAACCAGAGTCATCTGGGGAAAAGTAACTCGGGCCC 

ATGGAAACAGTGGCATGGTTCGTGCCAAATTCCGAAGCAATCTTCCTGCTAAGGCC 

ATTGGACACAGAATCCGAGTGATGCTGTACCCCTCAAGGATTTAAACTAACGAAAA 
ATCAATAAATAAATGTGGATTTGTGCTCTTGT 

Sequence ID - 764 n t : 626 

GATTTTTTTTTTTTTTTTGAGATGGAGTCTTTCTCTGTCGCCCAGGCTGGAGTGCA 

GTGGTGAAATCTCGACTCACTGCAACCTCCGTCTCCTGGGTTCAAGCAATTCTCCT 

GCCTCAGCCTCCTGAGTAGCTGGGATTACAGGCACCAGCCACCACGCCCGGCTAAT 

TTTTGTATTTTTAGTAGAGACAGGTTTTCACCATGTTGGCTAGGCTGATTTTGAAC 

TCATGACCCCAAGTGATCTGCCCGCCTCGGCCTCCCAAAGTGCTGGAATTACAGGT 

GTGAGCTACCACTCCCAGCCAATGATTACATTTATAAGGTAAAATAACTTGTGCCA 

ATCTGTACAAGTGAATTCAGATTTAAAATTTTAATTGTAAAAAGATATCCAGGTGA 

TATTTCTCCCTGAATAATTTAGTTTCCTTTTCTATTTCTTGATATAAAAGTACTCA 

gcattgaagtaAttgotatcttcacatttcttcctatttgagctgtctaaataagt 
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AGTCCTACATATTTTCCCCCCAACACAAAAAACCCAGAAAAGAATTATTTTATACT 

GGATTTTTTTGGTTGTAGCAGGAACCTAAAGGNGCCAATTGTAACATGCATGTTCT 
TTTTGGCAAA 

Sequence ID 766 

GTCCATCCTGCAGGCCACAAGCTCTGGATGAGGAACTTGAGGCAAGTCACCAGCCC 

CTGATCATTTCGCCTAAAAGAGCAAGGACTAGAGTTCCTGACCTCCAGGCCAGTCC 

CTGATCCCTGACCTAATGTTATCGCGGAATGATGATATATGTATCTACGGGGGCCT 

GGGGCTGGGCGGGCTCCTGCTTCTGGCAGTGGTCCTTCTGTCCGCCTGCCTGTGTT 

GGCTGCATCGAAGAGTAAAGAGGCTGGAGAGGAGCTGGGCCCAGGGCTCCTCAGAG 

CAGGAACTCCACTATGCATCTCTGCAGAGGCTGCCAGTGCCCAGCAGTGAGGGACC 

TGACCTCAGGGGCAGAGACAAGAGAGGCACCAAGGAGGATCCAAGAGCTGACTATG 

CCTGCATTGCTGAGAACAAACCCACCTGAGCACCCCAGACACCTTCCTCAACCCAG 

GCGGGTGGACAGGGTCCCCCTGTGGTCCAGCCAGTAAAAACCATGGTCCCCCCACT 

TCTGTGTCTCAGTCCTCTCAGTCATCTCGAGCCTCCGTTCAAAATGATCATCATCA 

AAACTTATGTGGCTTTTTGACCTTTGAATAGGGAATTTTTTAAAATTTTTTAAAAA 
TT 

Sequence ID 768 

CCAGCGCAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAA 

GTTTTTTTCTCTTTGAAAGATAGAGATTAATACAACTACTTAAAAAATATAGTCAA 

TAGGTTACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAA 

GATTTTAAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAA 

AAGGTTTCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAAT 

GTATTTAAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAG 

AAGGGCAATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTT 

TAAAAGTTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAA 

CCGAAGGTGATTAAAAGACCTTGAAATCCATGACGCAGGGAGAATTGCGTCATTTA 

AAGCCTAGTTAACGCATTTACTAAACGCAGACGAAAATGGAAAGATTAATTGGGAG 
TGGTAGGATGAAACAATTTGGAGAAGATAGAAGTTT 

Sequence ID 773 

GAGGAAAGGGGAGTTAATATTTAGTGGACAGAATTTCAGTTTTACAGATGAAAAGA 
GTTCTGGAGATAGACGGTGTTGATAGTTGCACAGCAGTGTGAATGTGCTCATTGTT 
ACCGAACTTAAAAATGTTTAACATAGTATTATGTGATTTTTATTTTGCCACTTAAA 
AAAAAAGAATGAAGTACTGATACATGCTACAACATGGGTGAGCTTTAAATACATTC 
TGCTCAGTGAAATAAGCCAGATGCAAAAGATCACATATTATATAATCGACTTATAC ' 
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GAGATACCTAGAATAGGCAAATTCATAGAGACAGAAAGTAGAATAGTGGTTCCCAG 

GGGCTGGGGACAAGGGGGCAGTGAGAGATTGAGAGTTATTATTAATGCGTACAGAG 

TTTCAGTTTGGGCTGATAAAAAAGTTCTGAAGATGGATGGTGATGATGGTTGTACA 

TCAATGTGAGTGTAATTACCGCCACTGAACTGCCCTTAAAAACGTTTAAAAGAGTA 
AATTTTATGTTGNGTATATTTTACCATAAT 

Sequence I'D 116 

TTTTTTTTTCATAAGAGGCAAGTACAAGAAAAAGCTTAATTACTTTAACTTCTAAG 

TAGTTTGGAATCTAAATAAATAGGAGTTACCAAATATATGCGCTTCTGTGAATAGT 

TTTCCCCCACATGTTTATTTATATTTTTGCATCTCATCAAACCTAACAGATTCTAA 

AGTCTCTGGTGATAATGACAATATCTGCTACGGAGAGACTAGCCTGGGGGAAGAGG 

ATCTCCCTGAACAAGGATAGCGGAGTTGCTGCAGCTTTCAAATGAAGCTGGACATT 

TAGCTGCGGGGGTAGCACCCTTTGATCAAGGCAGCCCAAAGATGAGTTTCAGGGAT 

GGGACTGACAGAAGAGAAAAGTTCTTCCCAGCCCTTTCTACTTTTTCTCTTTGTTT 

CTCAGGCTTCTGGCCGTCTTCAGTTTTCACAAGTTTCACTCTCAACCCTAAACAGT 

ACTTCTGTGAAGTACCCTTTGGCCCCTCGTTTTCAGCTCCTAAACTCACCTGGAAA 

TAGATGTCAATCTAATTTTGGGTCTGACTAGTGCAGTAGGCATTTTTGGTGA 

Sequence ID 782 

CTCACACAGAACAAAAATGAATGAGTGTGGCTGTGTGCCACTATCACTGTGTCTAC 
AAAAACAGCCAGTGGGCCTGATTTGGCCCTTGGCTGCAGTGCGCCCGTCTCTGTTT 
TTGAGGAATAAAATCGCATCATTTCATATGGCTAATGCAATTTTTTTCCCATCTGG 
AAGCAACATCTGATTGGACTCATCTTGTATGGTGCTTGTTACAGTCTCTGTAAATG 
GGAGAGGGTCCGAGAATAGCTCTTCCTGTTTTCATCAGGACTGTTTTTAGGGATGG 
CAAAGAAGTCAGTGTGTCCAGCCTGTGTCCTCCTCACCACGTGGCTGATTCCTGAA 
TCTGCATGTGCANCACNTGCCGTTGTCTGGGGCATGATCTGTGTGA 

Sequence ID - 785 n t: 556 

CTTTTCTCTGGGTATAGATTTACCCTAGCACCTATCTCATTATATTGAATTTTCCA 
GCATATTTAAATAAACTATTAATTAGTCACACTATTTCTTAAAAGTCACACTATCA 
ACTAATCGTGACCGCAATTATCTAGGGGTGATAATCTGCTGAGTCTACTCTTTAAA 
TACACTGGGACCCAGCATATTGAGTTATATTGGCACAGAAACTTCACTCTGGGTAT 
AGATTTACCCTAGTACCTTGCCGGCAGGATCCTATTATTCATGGTTGTACAAGCAA 
GGTTCAGGGAAGAGGCTGGCACAGAGAAGGTACCTGGTAACTGTTGTTTGAGGCTG 
AATTCAGCTCAACTCAGCTCCAGTAGAGATGGTGTCCCCTTCTCTACCGTGTTGAG 
ATAGTGTGCAGTCCCTTCCTAAGGGCTGTTAGCCACCGGAATAGGACTTGTCAGCT 
TCAACTTTTAAATTTCTCTGCTCCCGCTGGGACCCACCCGCTTCAAAAATCATCAT 
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GGNGGNTTTAGCACCAATTTAGTAAACACAAACTGTCTGAAATATTTTGGAT 
Sequence ID 796 

GAACATTCAAGATAGTGAGAGGAAGAAAAAGATATGGCTGTACGGGACCGAGGTCT 

CTTCTATTATCGCCTCCTCTTAGTTGGCATTGATGAAGTTAAGCGGATTCTGTGTA 

GCCCTAAATCTGACCCTACTCTTGGACTTTTGGAGGATCCGGCAGAAAGACCTGTG 

AATAGCTGGGCCTCAGACTTCAACACACTGGTGCCAGTGTATGGCAAAGCCCACTG 

GGCAACTATCTCTAAATGCCAGGGGGCAGAGCGTTGTGACCCAGAGCTTCCTAAAA 

CTTCATCCTTTGCCGCATCAGGACCCTTGATTCCTGAAGAGAACAAGGAGAGGGTA 

CAAGAACTCCCTGATTCTGGAGCCCTCATGCTAGTCCCCAATCGCCAGCTTACTGC 

TGATTATTTTGAGAAAACTTGGCTTAGCCTTAAAGTTGCTCATCAGCAAGTGTTGC 

CTTGGCGGGGAGAATTCCATCCTGACACCCTCCAGATGGCTCTTCAAGTAGTGAAC 

ATCCAGACCATCGCAATGAGTAGGGCTGGGTCTCGGCCATGGAAAGCATACCTCAG 

TGCTCANGATGATACTGGCTGTCTGTTCTTAACAGAACTGCTATTGGAGCCTGGAA 

ACTCAGAATGCAGATCTTTTGTGAACAAAATGAAGCAAGAACCGGAGACNCTGAAT 

AGTTTTATTTCTGTATTAAAAACTGNGATTGGAACAATTGAAGA 

Sequence ID 8 01 

CCACTCCACCTTACTACCAGACAACCTTAGCCAAACCATTTACCCAAATAAAGTAT 
AGGCGATAGAAATTGAAACCTGGCGCAATAGATATAGTACCGCAAGGGAAAGATGA 
AAAATTATAACCAAGCATAATATAGCAAGGACTAACCCCTATACCTTCTGCATAAT 
GAATTAACTAGAAATGAGGATTCTGACCTTGACTTTGATATCAGCAAATTGGAACA 
GCAGAGCAAGGTGCAAAACACAGGACATGGAAAACCAAGAGAAAAGTCCATAATAG 
ACGAGAAATTCTTCCAACTCTCTGAAATGGAGGCTTATTTAGAAAACAGAGAAAAA 
GAAGAGGAACGAAAAGATGATAATGATGATGAGTCAGGTAAAAGTTCCAGAAATGT 



VTC 

ATGATGATGAGCTGGGTTCAAACAAGATGATGAAATTGCTGAAGAAGAAGCAGAAG 
AAGGAAGCATTTCTGAAATATGAATGAAAAAAATTACATCTTTAGAAAAAGAGTTA 
TTAGAAAAAAGCCTTGGCAGCCGTCNGGGGGAAGTGACGCACAGAAGAGACCAGAG 
AATAGCTTCCTGGANGAGACCCTGCACTTTACCCATGCTGCTGGATGG 

Sequence ID - 808 n t : 641 

CCGGGTTTTAGTATTTAACCAAGAGCCTTTTAAATATTGAAAACCCATAGTTCAGA 
AAATGTTAGTATTGCTGCCCTTCTTCACATAAATTTTTTTTTAAATTATACTATTA 
TTTTGCTTAATTTTATATTGGGTTAAAACAACCTTCAAGAAGGTTAACTAGGAAAG 
AAGACCTTTTTGTTTTATTTTTACTATTTATATATAGAAGACAAATCAGCATTTGG 
TGATAGTTTTACATGACCAGTTATCAAACGGTCATAGTATGAAGTGTGCAGTTGTT 
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CATTATTAGTAAATTATGTTTGATTTTTAAACTATTTAGTACTAATAGTTGAGATG 

AAAACTGAAGAAAAATGCCAATGTGACGTTTGTGTATAGCTAGCCTTAAAAAACTT 

CCCATGTTTTTAGGTGACTTTTTTCCCCCTCTTAGTACTCTGGAGAAACAATGAAG 

ATGGGCCATCTCAATTCCAGATGTAAACAAAAAGTAATTTTTATTTCAACATTTAA 

TGTAACTGCTATTATTGNGGATTCTTGNCTTGNGTATTTTCTTTCCCTTATTCAAG 

TAATATAGAATAACTTTCCTTAAAATGATTTGATCCAAGATACGTCATTTCTGTAT 
TGGCAAAATGCCNCTATTAAAGTGT 

Sequence ID - 814 nt . 132 

GTTAAAGTGATACATTTTTATACCAAATGTGTTTATTTTTTTGTGCAAGTAATCCT 

TAAAATTGCAATTGTATTAGGTGTTAAAATAAAGTTTTTAAAAAATTAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAA 

Sequence ID 817 

GACAACCTTAGCCAAACCATTTACCCAAATAAAGTATAGGCGATAGAAATTGAAAC 

CTGGCGCAATAGATATAGTACCGTAAGGGAAAGATGAAAAATTATAACCAAGCATA 

ATATAGCAAGGACTAACCCCTATACCTTCTGCATAATGAATTAACTAGAAATAACT 

TTGCAAGGAGAGCCAAAGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAG 

CTAAAAGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGG 

CGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCA 

ACTTTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTC 

CAAAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAA 

TTTAACACCCATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTC 

AAC^CCCACTACCTAAAAAAATCCCAAACATATAACTGAACTCCTCACACCCAATT 

GGACCAATCTATCACCCTATAGAAGACTAATGTTAGTATAAGTAACATGAAAACAT 

TCTTCTNCGCATAAGCCTGCGTCAGATTAAAACACTGAACTGACAATTAA 

Sequence ID - 821 nt: 370 

AAAGAGCTCCCAAATGCTATATCTATTCAGGGGCTCTCAAGAACAATGGAATATCA 

TCCTGATTTANAAAATTTGGATGAAGATGGATATACTCAATTACACTTCGACTCTC 

AAAGCAATACCAGGATAGCTGTTGTTTCANAGAAAGGATCGTGTGCTGCATCTCCT 

CCTTGGCGCCTCATTGCTGTAATTTTGGGAATCCTATGCTTGGTAATACTGGTGAT 

AGCTGTGGTCCTGGGTACCATGGCTGGTTTCAAAGCTGTGGAATTCAAAGGATAAA 

TTAATGAAGAAAACAAGCGGAGCTGAAGAAGAAAGTACAATATGGTGCTGTCTTCC 
TAATGAAATAAATTCACTAAATGGACATTAAAAA 

Sequence ID 825 
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AGACTCGAGCAAGCTTATGCATGCATGCGGCCGCAATTCGAGCTCGGCCACTTGGC 

CAATTCGCCCTATAGTGAGTCGTATTACAATTCACTGGCCGTCGTTTTACAACGTC 

GTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCT 

TTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTT 

GCGCAGCCTGAATGGCGAATGGAAATTGTAAGCGTTAATATTTTGTTAAAATTCGC 

GTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAA 

TCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGG 

AACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGT 

CTATCAGGGCGATGGCCCACTACGTGAACCATCACCCTAATCAAGTTTTTTGGGGT 

CGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAAAGCT 

TGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAAAAAGCCAAANGGAG 

CCGGCGCTAGGGCCTGGCAAGTGTACGGGCACGCTGCGCGTAACCACCCACACCCC 

GCCGNGCTTAATGCCCCNTTCAGGGCGCGTNCTGATGCCGNATTTTNTCTTACNCA 
TNTGTGCNGGNTT 

Sequence ID 833 

TAAAATAATGGCAAAAAACAAACAAAAAACAAGTTCTCTAAACAGAAAGGAAATTA 

CTAAAGAAGGAATCTTGAAATAACAGGAAAGAGGAAATACCACAGTAGGCAACATT 

ATGGGTAAATAAAACAGACTTTCCTTCTTTAGTTTCCTAAAATATGTTTGATGATT 

AATGCAAAAATTACAATATTTTCTTATGTAGCACTAAAGGTATGTAGAGAAAATAT 

TTAAGATAATTGTACTGTAAGCGGGAGATGACAGTGACATAAAGGCAACGTTTTTA 

TACTTCACTCAAACTTTATGTATTAATGTAATCCATAAAGCAACCAAAAAAGCTAT 

ACTAAGTACATTCAAAAACACAATAGATAAACCAAACAAAATTCTAAAGGATGTAC 

AAGTAACCCACTGGAAGCTGCAAAAAATGTAAACAGAAACTAAAAACAGAGAATAA 

ATGAAAAATTAAAAACGAAATGGCAGACTTAGGCCCTAATATACAAATTATCACAT 

TAAATATAAATGGTCTAAATACACCAACTGTAAGACAGAGATTAGCAAAGTCGATT 

TAAAAACATGACTCAACTACGTGCTGTCTACAAGAAACTCACTTCAAATATACCAA 

GATAGGAAGGTTGAAAGTAAAACGATGGAAAAAGATGTATCATGTGAACATTAATC 

AAAGGAAAGCAGGGGTGGCTATATTAA.CATCAGGTAAAATAAACTTT 

Sequence ID - 837 n t : £03 

TGAGGNTGGTCATGATGCANAAGCTACTCAAATGCAGTCGGCTTGTCCTGGCTCTT 
GCCCTCATCCTGGTTCTGGAATCCTCAGTTCAAGGTTATCCTACGCGGAGAGCCAG 
GTACCAATGGGTGCGCTGCAATCCAGACAGTAATTCTGCAAACTGCCTTGAAGAAA 
AAGGACCAATGTTCGAACTACTTCCAGGTGAATCCAACAAGATCCCCCGTCTGAGG 
ACTGACCTTTTTCCAAAGACGAGAATCCAGGACTTGAATCGTATCTTCCCACTTTC 
TGAGGACTACTCTGGATCAGGCTTCGGCTCCGGCTCCGGCTCTGGATCAGGATCTG 
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GGAGTGGCTTCCTAACGGAAATGGAACAGGATTACCAACTAGTAGACGAAAGTGAT 

GCTTTCCATGACAACCTTAGGTCTCTTGACAGGAATCTGCCCTCAGACAGCCAGGA 

CTTGGGTCAACATGGATTAGAAGAGGATTTTATGTTATAAAAGAGGATTTTCCCAC 

CTTGACACCAGGCAATGTAGTTAGCATATTTTATGTACCATGGNTATATGATTAAT 
CTTGGGACAAAGAATTTTATAGAAATTTTTAAACATCTGAAAA 

Sequence ID - 839 n t : 71 

ATTTATCTAATATTTGGTTTAATAAAATGTGAATAATGAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAA 

Sqeuence 849 n t : 622 

TTTTTTTTTATTTTTTGAGAATGGAGTCTTGCTCTGCCGTCCAGGCTAGAGTTCAG 

TGGTGCGATCTCAGCTCACTGCCACCTCACCTCCTAGGTTCCAGAGATTCTTGTGC 

TTCAGCCTCCTCAGTAGTTGAGAATACAGGAACACGCCACCACGCCTAGCTAATTT 

TTGTATTTTTAGTAGAGATGGGGTTTCACCATGTTGGCCAGGCTGGTCTCAAACTC 

CTGGCCTAAGTGACCCACCTGCCTCAGCCTCCCAAAGTGCTGGGATTATAGGCGTG 

AGTCATTGTCCCCAGCCGGATGTTTTCATCTTGATTTGCCTTAGTTTCTAAATCTC 

ATCCTCTCCATTTTCTCCTGTTAGTAGTCACAGAGAACCAAATTCTGTCAAGTTAT 

GAAACTAAAGTCTCTCTTCCACAAGTCTTCCTGTGTTCTGCCTCAAGTGAACTTGA 

AAGAACATCAGTTTGTGGGAAGGTTGAAGACCGAATGATCTGCTGGGAAATCACTG 

AGGCATTGCCATTCTCTTGAGGAATTTCATTTTCATCGAAGTTTCGGTTTATATCC 

CTTTCTTGGTGAGTACTATTGCTGTTATGTAAATTAAATGAGTCGTCATCCTTCTT 
NTGAGC 

Sequence ID - 860 nt : 501 

GTGAAATCACTTTCATGGATTATTAATGGATTTAAGAGGGCATCAATCAGCTCAAC 

TCAAGATTTCATAATCATTTTTAGTATTTAGATTGTGCCTCAAAGTTGTAGTACCT 

CACAATACCTCCACTGGTTTCCTGTTGTAAAAACCTTCAGTGAGTTTGACCATTGT 

GCTCTTGGCTCTTGGGCTGGAGTACCGTGGTGAGGGAGTAAACACTAGAAGTCTTT 

AGTACAAAACTGCTCTAGGGACACCTGGTGATTCCTACACAAGTGATGTTTATATT 

TCTCATAAAGAGTCTTCCCTATCCCAAGGTCTTCATGATGCCAGTAGCCATATATG 

ATAAATTATGTTCAGTGATAACTTAGTTATCAGAAATCAGCTCAGTGGTCTTCCCC 

GCCATGATTCACATTTGATGAGTTTTTAAAAATCAAAGTGATTTTGAAAATCTCTA 

ATGGCTCAGAAAATAAAAACATCCAGTTTGTGGATGACTATATTTAGATTTCT 

Sequence ID 864 

TTGTGTTTTTA(3GACTCCTTATCTAAATTAAGGCAGAGAAGTTACAGTATTTATAT 
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CTGCA.TTAAATCTC^IATTCCAGAAAAACCTTTT6AAAAATTATTTAA^TCCTCTGQA 

AACTATTGATATGATACAGGAGAAATTTTCAGAAGTTTATTGAATAATTTAATATC 

ATTTAATAGGACACTCTGGCTTGTATATAAGCAGATACGTTACTCAGACTTCTTGG 

CTGTACTCTAAAATAATATATGTACTAGTCTCCTAAATATTACTAGCTCACCTTTC 

AAAATGCATACTAATATTTCAATGTCTTTCTTCAATTTGAAAAGCTCTTGAATATG 

TACTTGTGATAGCCCTAAGAGCTGAGATAATTATTTCCAGGAGGTTGAATCCCTGA 

TTCTTAACTGTTCAGCAATGCATAAGCAAGAGAGAATATGACATAAGAGGACCATT 

TCTACATTAGCCATTTTTTTTCACAAGATACCTATGTGAATACAGGGCACCTGGGA 

GGGTAAGTGGAGGACTATTTCTAACTATATTTATAAGCACATACTGATATTGGTGA 

ATCAAAACCTACAGCAGTGCTTCTCAGATGGGAAGGGAGACAATGTGTAAGGAGAT 
CAGGAATTCATTAG 



Sequence ID - 865 nfc . ±22 

CCANAATCCACTCTCCAGTCTCCCTCCCCTGACTCCCTCTGCTGTCCTCCCCTCTC 

ACGAGAATAAAGTGTCAAGCAAGAAAAAA2^AAAAAAAAAAAAAAAAAAZy^AAAAAA 
AAAAAAAAAA 

Sequence ID 867 

TTTTTTTTTTTTTTTTTTTCAGAGTCACAGATATTGTATAGCTGAGGTAAGCATTT 

TACAACTTTTCAGACACAAGTAAGTACATAAATATTATTTTACAACCAACAATNTT 

TAATATTTCCACATTGAANAATAGATGTGATAATTAAATCTTTTATAAGGTTTTAA 

AAAGACATGAAACATAAACCTAATTATACATAAAAGAAAAGAATTTTAAACAAGAG 

CTTATTGNGATGACATTACTCATAACTTTTACCTTTAAAACCTTTTCTTGGGTAGC 

TATTCAAAAGTAAAGACCACAAGTTTTGTTGCCCANATTTCTTATGTTTNGTATAT 

TTAAGCTCTTTATTTATTGAACAGATGNGTCATTAATTCATTNGGAGCATTACTAT 

TATCAGTAAAATTTGATTTTTTTTTCCCCTCAGTCATAGGTAAATCAGCTCCACCT 

GGAATTTCTAAGGACCCAGTTTTAGTCAATATTTTCAAGTAATCATGACCTCAGAA 

ATAGTCTTAATTAAGATAACAAATATTAGCCATCAAAATGGAACCAAGACAAGATT 

CTAATGTTTGTAAACAGTCAATCCATATTTATGAATATTAGCATATATTGGNGAAT 
AGTTAAGGCAAAAGGGTCTAGCAG 

Sequence ID - 869 nt . g6? 

TTGTGTTTTTAGGACTCCTTATCTAAATTAAGGCAGAGAAGTTACAGTATTTATAT 
CTGCATTAAATCTCAATTCCAGAAAAACCTTTTGAAAAATTATTTAATCCTCTGGA 
AACTATTGATATGATACAGGAGAAATTTTCAGAAGTTTATTGAATAATTTAATATC 
ATTTAATAGGACACTCTGGCTTGTATATAAGCAGATACGTTACTCAGACTTCTTGG 
CTGTACTCTAAAATAATATATGTACTAGTCTCCTAAATATTACTAGCTCACCTTTC " 
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AAAATGCATACTAATATTTCAATGTCTTTCTTCAATTTGAAAAGCTCTTGAATATC 

TACTTGTGATAGCCCTAAGAGCTGAGATAATTATTTCCAGGAGGTTGAATCCCTGA 

TTCTTAACTGTTCAGCAATGCATAAGCAAGAGAGAATATGACATAAGAGGACCATT 

TCTACATTAGCCATTTTTTTTCACAAGATACCTATGTGAATACAGGGCACCTGGGA 

NGGTAAGTGGAGGACTATTTCTAACTATATTTATAAGCACATACTGATATTGNTGA 

ATCAAAACCTACAGCAGTGCTTCTCAGATGGGAAGGGAGACAATGTGTAAGGAGAT 

CAGGAATTCATTAGTCACCTTTCAGATGGTTTAATGCATACAGCTGTACCG 

Sequence ID 870 

GGAGTTTGAGCAGATCCTTCAGGAGCGGAATGAACTCAAAGCCAAAGTGTTCCTGC 

TCAAGGAGGAACTGGCCTACTTCCAGCGGGAGCTGCTCACAGACCACCGGGTCCCC 

GGCCTTCTGCTCGAGGCCATGAAGGTGGCTGTCCGGAAGCAGCGGAAGAAGATCAA 

GGCCAAGATGTTAGGGACACCAGAGGAAGCAGAGAGCAGTGAGGATGAGGCTGGCC 

CATGGAT^CCTGCTCTCCGATGACAAGGGAGACCATCCCCCACCCCCGGAGTCCAAA 

ATACAGAGTTTCTTTGGCCTATGGTATCGGGGTAAAGCTGAATCCTCTGAGGATGA 

GACCAGCAGCCCTGCACCCAGCAAGCTAGGGGGAGAAGAGGAGGCCCAACCACAGT 

CTCCAGCTCCTGATCCGCCCTGTTCTGCCCTCCACGAACACCTTTGTCTGGGGGCC 

TCAGCCGCCCCAGAGGCCTGACTTAGGGGTCTGGCTGTGGAAGGATGTGTGGCCTC 

AAATGAGGACAGGGCTCCCGCCTTCACAGCCCTCGCCAGGGGTCTGCCCCAATCCT 
GGCCTGCATCAGGCAAGGACGGGGTCTCAGC 

Sequence ID - 871 nt . 642 

GCAAGTCTTCAGTATGTACATTTATCCCCTAGAAGAAGAAAAATTAGTTGTGCATG 

AAAAAGAAACATTAACTGCAAAGCTAAATGCTCACACTCTAAATCAGTGCTCTCCA 

AAGTACAGCAGGCGGGAAAAGAAAATGGTAGATTTTTTTCTTCCAATTACTTTAAC 

TTATTCTTTTTAATGGACACTTCATACATAAATATATTCACAATATATTAATATAT 

ACATAATGTATAAGCATACATATTGAATGTGCAGTCAAAAAATGTACTAATGGAAT 

GCTCTACCAAAACAAGTTCACGTTCATCTGTAAAATGGGAATAATATTTTTAAAAG 

GCATACAGTCTGAACATTTTTAGATTATTCATAAAATCTATTCAGAAAGTTAAACT 

AAAAAATTTAACGTATGCCTATAACAAATTTTGTACTTAATGTAATTGNTTTTCAT 

CCTGAGATCTAATATCCTCGTTTTTAAGTAGAGCCACTTGTTTGCTACAGTTTAGT 

CAAAACGTTAACATTAGATGGGTAAAGTAATATGAAATCTTTCTACTACTCCAAAA 

TAGAAAACAGAACATTAAAAAGATAAAAATTCAAACATACTTACCAGTAGATTTTC 

AACTGNGCAAAAGCTCATTGCATGGG 

Sequence ID 873 

gttttccaccgtgaagagaacattt' cctctgggaatgacaaagccctcaggaacng 
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CTTTTATTTCTATTGGAAGATGCCCATCATACTTCTGGCAGGATAAAA.TGATAAAT 

TTATTTATTCAACAGATGATACTCAATTCCCTGCTGTTTTACTAAAGGTTCTTTAC 

GTTTTATAGAAGCTAAATTTACTGTCATAGAAATTGCAATTGTAGATGTTACTGTA 

ATCTAGTCAGAATATCCTTATCCTTCTAAAATAAAACTAGTTAAAATTATTAACAT 

ACGTACTGATATTAATTTTTAAGTTTAATGCTGCCACGTGCTTCTGCTAAGAACAT 

TTATCACTACAAGTGGC71GAAAATTCCAAACTCATCAAAACCAAACTGTTGCTTCT 

TCCCTGCTTTTTCAGAAAATGAGAAAGGATGACTTTATTCCAACATATTCTAAAAG 

TATTCCAAGAACACTACCTTTATTCTAAATTCGTTATTTTCACAAAATAAAC^CTG 

CAGATTGAAAGATAAAGGATTGCTATTAAAGAACAAAAGAAAACAAAACCGAGAGA 

GAAGGAGAGCTAGGGAAATCCCTGCANAANAACCGAATANGGTCCCTCTATTCTGG 

GCCGGGGCCTGAAACTATGAAACAGGCCAACACAGAATCTTGGCA 

Sequence ID 8 75 

CCTCTGACTCGCTCAGCTCACCCACGCTGCTGGCCCTGTGAGGGGGCAGGGAAGGG 

GAGGCAGCCGGCACCCACAAGTGCCACTGCCCGAGCTGGTGCATTACAGAGAGGAG 

AAACACATCTTCCCTAGAGGGTTCCTGTANACCTAGGGAGGACCTTATCTGTGCGT 

GAAACACACCAGGCTGTGGGCCTCAAGGACTTGAAAGCATCCATGTGTGGACTCAA 

GTCCTTACCTCTTCCGGAGATGTAGC^AAACGCATGGAGTGTGTATTGTTCCCAGT 

GACACTTCANAGAGCTGGTAGTTAGTAGCATGTTGAGCCAGGCCTGGGTCTGTGTC 

TCTTTTCTCTTTCTCCTTAGTCTTCTCATAGCATTAACTAATCTATTGGGTTCATT 

ATTGGAATTAACCTGGTGCTGGATATTTTCAAATTGTATCTAGTGCAGCTGATTTT 

AACAATAACTACTGTGTTCCTGGCAATAGTGTGTTCTGATTAGAAATGACCAATAT 

TATACTAAGAAAAGATACGACTTTATTTTCTGGTAGATAGAAATAAATAGCTATAT 

CCATGTACTGNAGTTTTTCTTCAACATCAATGGTCATTGNAATGTTACTGATCATG 

CATTGGTGAGGNGGTCTGAATGTTCTGACATTAACAATTTTCCAT 

Sequence ID - 876 nt . 115 

AAACTTTTGTGGCAACAGTGCACTAATTTGGATAATGTTTGTTCCCAATAAATTAA 

GAGCCAAATTGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAA 



Sequence ID - 878 nt . g34 

GCCAGGCTTTGTGAATTACAGGACATTTGAGACAATCGTGAAACAGCAAATCAAGG 
CACTGGAAGAGCCGGCTGTGGATATGCTACACACCGTGACGGATATGGTCCGGCTT 
GCTTTCACAGATGTTTCGATAAAAAATTTTGAAGAGTTTTTTAACCTCCACAGAAC 
CGCCAAGTCCAAAATTGAAGACATTAGAGCAGAACAAGAGAGAGAAGGTGAGAAGC 
TGATCCGCCTCCACTTCCAGATGGAACAGATTGTCTACTGCCAGGACCAGGTATAC 
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AGGGGTGCATTGCAGAAGGTCAGAGAGAAGGAGCTGGAAGAAGAAAAGAAGAAGAA 

ATCCTGGGATTTTGGGGCTTTCCAATCCAGCTCGGCAACAGACTCTTCCATGGAGG 

AGATCTTTCAGCACCTGATGGCCTATCACCAGGAGGCCAGCAAGCGCATCTCCAGC 

CACATCCCTTTGATCATCCAGTTCTTCATGCTCCAGACGTACGGCCAGCAGCTTCA 

AAAGGCCATGCTGCAGCTCCTGCAGGGACAAGGACACCTACAGCTGGCTCCTGAAG 

GAGCGGAGCGACACCAGCGACAAGCGGAAGTTNCTGAAGGAGCGGCTTGCACGGCT 
GACGCAGGCTCGGCGCCG 

Sequence ID 879 

GTTGCCGGGTCCtGTGATAACTCTGTTTAACATTTTGAGGAACTGTTGAATGGTTT 

TTCACAGCAGCTGCCTCATTTTTTATTCCCATCAGCAGTACTTCTTGGTTCTAATA 

CCTCCACGTTCTCGCCAACACTTGTTGTTGTCTGTAATTTCGTTGTTAGCCATCCC 

AGTGGGGATGAAGTAGTATCTTACTGTGGTTTTCAGTTGCGTTTCCCTGATAATTA 

ATGATGGTGAACATCTTTTCATGTTCTTGTTGGCCATTTGTATGTCTTCTTGGGAA 

AAAAAAAATGTCTGTTCAAATCCTTTACAAAGTATTTATTTTTTATGTCAACAATA 

TAACCACTCAGTACACTGCTTTTTANACAATGATCTTTTAAAGGTTTGTTTACAAC 

ATTTAGCACTTGAAATTTTAAGGTTATGCCCTCAAAAAAATTGCTGAGGGAGCTAA 

GCTATGAAGATGCAAAGGCATAANAATTATACAATGGACTTTGGGGGAATCCAGGG 
AAAGGGTGGGAGGGGGGTGANGGA 

Sequence ID 8 81 

TCGACTCTGATTTTTTTTTCTCCTTCCTCGCAGCCGCGCCAGGGAGCTCGCGGNGC 

GCGGCCCCTGTCCTCCGGCCCGAGATGAATCCTGCGGCAGAAGCCGAGTTCAACAT 

CCTCCTGGCCACCGACTCCTACAAGGTTACTCACTATAAACAATATCCACCCAACA 

CAAGCAAAGTTTATTCCTACTTTGAATGCCGTGAAAAGAAGACAGAAAACTCCAAA 

TTAAGGAAGGTGAAATATGAGGAAACAGTATTTTATGGGTTGCAGTACATTCTTAA 

TAAGTACTTAAAAGGTAAAGTAGTAACCAAAGAGAAAATCCAGGAAGCCAAAGATG 

TCTACAAAGAACATTTCCAAGATGATGTCTTTAATGAAAAGGGATGGAACTACATT 

CTTGAGAAGTATGATGGGCATCTTCCAATANAAATAAAAGCTGTTCCTGAGGGCTT 

TGTCATTCCCAGAGGAAATGTTCTCTTCACGGTGGAAAACACAGATCCAGAGTGTT 

ACTGGCTTACAAATTGGATTGAGACTATTCTTGTTCAGTCCTGGTATCCAATCACA 
GTGGCCACAAATT 

Sequence ID 883 

TCATTTACATTAATACTCAAAACTGCTCGATTAAGCAGGTGCTGTTCTTATCGCCA 
TTTTGCATATGATGAGAAAGGGTAAGGTCACCCAGCTAGTATTTGGCTCACAGCAG 
GCCTTAAGACTTGGTTTGTGTGACTCATCAGTCCACGCTCCTAAAACCACTAAGTT 
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GTTCTACCCTTTAATGTTGAATTAACATTGGATAGTGTTCAAGTTTANATGGGTGG 
GTGAGGGCCCAAGGACCTTTCAAACTCAGATCTCTTATTTAATAACCTGGTCCCAG 
ATCCT^TTCCTCTGTCGAAGAGGAAGTCATCCTTCAGTGGCTATTCATTGTGGGGTT 
AAGAGCGCAGACTATGAATTCAGrcTTTTTGGGTCCCAGTT^CCAGACC^S 

GAGTGCCCCGAGTTTACTTACTTGTAAAGGTAGGTGGAGGTAATATAATTAAATAA 

acttaaaaaactaattaaaaacaaaacaaatgaactaaggtcttaggatI^gg^ 

GTCTATTTTGCGCCAAATCACATAATGTCTATTGTTGTGTGTTGGACTATAGGA^T 

gtcctttaacagggaagggtttatttctgtaatcaagtctgtcaatattatgaccI 

TGTTGATAATAGCTACCTTTAATTGAGGGCl-rCCATGTGCCAA 
Sequence ID 885 

TCAGTGGAAAAGGGCAGGTTGAATCAAGGTGAATCAATCTGAAATTGAGCA.CACCT 

GCCTGCCATCGCTGTTCCTTCAACTGAGTGCTGGACATCATGGGCTGTGTGTGTGA 

GAGAAAAATCCCGGTGCTTGGTGTCCTTGGATGACATGGAGTTTTGCATGTAGATC 

AATTTAAAATGTACCTCTTGTTTACATAATTTGCATAATTTTAAAAGATAATGTTG 

CCAAACTTTGGAAATGTTAATGTTCANACTGAAAATCTCCACTACATGTAACTTTC 

TTCCTCTGGATCAGTGGCATGGCTTATAATCCCAGCCAGTGGTTTGAACTGTTCCA 

GTGTCAACTGCCATGTGCTCTGCTTCAAGGGGGAACTAGGCTTTTGTGAATTTTTT 

GTACATAAGTATTTGTTACAAATATTTTAGCAAATGCTTTCTATTTCTCTTGCTTG 

TGCATATCTTGGCTGGCGTTACAGAAAAATAGTGTAAACATTATTTCCTTACCGGG 
GAATGAGGGTTTT 

Sequence ID 887 

AGCACCTGGCACAGAGTAGTAGCTAACACAGATGTTAATTTTGCTGCGTCAAATGT 

TTTCACTTTGAATCTCTCTTGAGTATTGTTCTCCTTATTGATTACATGATGACATC 

CTGTTTTCTCTCCCTGACCTTTACTGTTTGTTTAGAAAAAAAAAAAAAAAAAAAAA 
AAAAAA 

Sequence ID 889 

CAGAGAGCTTGTTCCCTCCCTCCCTGTGCATGCAAACAAGAGGGCATGGGAGCACA 
CAGAGAGATGGCAGCCACCTACAAGCCAAGAGGAGAAGCCTCACAATCAAACTCTC 
GCTGCTGGCGAGAGTCTTGGACTCTGTCTTGGACTTCCAGCCTCCAGACTGTGAGA 
AACAAATTTCTGTTGTTTC^GCTTCTCAGTCTCTGGTGTTTTGTTATTGCAGCCTG 
AGAAC^CAGCTGTACNATTATNAGGGAAACAGAAAACACTGATACTTAACAATGCT 
AATGCAATTATTTATTTGCTTTTCAGTCTCTACAAAACGTTCTAAAACACTAATCT 
AAATATTAACAGTAAAATATTTGCATAACTAATGGAAACTAAGAAATCATATGACC 
AATATTTCACTTATTGGTAATCTTACTCTACTGATTTCCCCCCAGACTGTGATTTT 
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TGAACTTCCTTGCCTTTCTCCTGTCTTTCTGNGTTTATTCATGGAATTCCAGTTAT 
CTGGGCTTGAAATTG^ 

ATGAGATAAATGTTTCTTTTTTCTTTCTGACTGCATTAAATCAGATACAACTCAGC 
ATTAAAAAGCTATCTTTGNAAAATGNTGGTACTAATAAATTAGTCTTA 

Sequence ID 8 90 

CCAGTTC^TTCAGTGAAGTCATGAACTTGAAATTGGCCATGATCAAAAAGTAT 
TTAAATCACAGAAGTTGCAAATGCCACAAATCAAGGTCTTTTTCTCTTGGAGAACC 
TGTTAAACATTTACCAACTCACGACCGCCATGCACCCAATACTGCAATAGGTCTAT 

agatgcagatactgtctccatgaatcttataggctagaaaggaaatagataagtag 

TCCTACCAGAAGAACATGATGAAGGCATTTGTGGTAAACAGAATGATGGCCCCCCA 

AAGATGTCCACATCCTAATCCCTGAAGCCTATGAATATACTACTTTACTTGGCAAA 

AGGGACTTTGCCACAGGTTTTTAATTAAGGACCTTGAAATAGAGAGATTATCCTGG 

ATAATCCAGATGGCCCCAGTGTAATCCCAAGGGTCCTCACAAAGGGTAGGAAGGAG 

AGCCAGAGTCAGAGAAGGAGACGTAGCAATGGAGGCAGAGGTCANAGAGAGATCTG 

CAGATGCTGCTGTGTTGGCTTTGAAAATGAGGAATGCAGGTGACGTCAANGNGCTA 

GATGATGCAAGGAAACAAATAATCTCCTATGAACCCTAGGATGGGCATTATTATGA 

GTCCTATTTTATAAACAAGGAACTGACNTCCAGAAAGATAAATGC 

Sequence ID - 891 nt . 62g 

GGCAGAGGTTGCAGTGAACTGAGATCATGCCATTGCAATCCAGCCTGGGCAACANG 

AGTGAGACTCCATCTCAAAAAAAAAAAAAAAAAGACAAGAGTNTCCACTCTAAACA 

CTTNTATTCAACATAGTCCTGAAAGTCGTAGCCACAGCAATTTAACAAGATAAAGC 

AATAAAATGTATTCAAATAGAAAAAGAGGAAGTCAAATTATCTTCACTGGNGATAT 

AATTCTCTACCTGGGAAACTTCACCGAAAAAGATTTCACCAAAAGATTTCTAAGCC 

TAAATAATGACTTCAGCAAAGTCTCACCATACAAAATCAACATACACAAATGAGTA 

GCATTTCTGTGCACCAATAATATTCAAGCTGAGAAAAAAAGAACATGGTTCTATTT 

ACAATAGCTACAAACAAAAAAATATGTACCTAGTAATACATTAAATCAAGGNGGTA 

AAATATCTNTAC^CAAGAACTACAAAACTGCTGAAAAAAAATAGAGACACGCAAA 

TAAGTAAAAAGGCACTCCATGCTCATGAATTTAAAGAATCAATATAATTAAAATGT 

CCGNGCTGCCTAAAGCAACTTACAGATTAAAGGCTATTTCTCTCAAACTATAAATG 
CACCTTTTTA 

Sequence ID - 893 nt: 585 

GTCATTGCTGGGTGGCGCCAGCCCTCAGACTTGCCTCTTTGCAGTAGGAAGAAGGC 
CTCCCCACATACCTTCCCACACTCATCACCTTAAGCCAGACTCGGTGTCCAGTGAA 
TATGACCATCTCTTGCCCATTTTCTAATGAGTGTTTTCATTAATGAGTTATAAGAA- 
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TGTGGTGGGTAAATCTATGGGCTTTGAACTAGTGAATCAACTTGGTTTCAGAATCT 

GGCACTGCTACTTACTAGTGAATTTAAGCAAGTTATTTCACCTTTCAGAGTGTGAG 

TTCCCTCATGCATACAAGGAAGATAAAAAATAATGTNTACNASAGTATTGGAGTAA 

TTAATACATGGAGAACTACATGTAAAGCGTTTAGCATGATGTCTGACATATTAAGC 

ATCCAATATTAGTNGCTTGCAGAATTATTAGTAAAAGAGATTGCTTCTGAAAGCCA 

TTCCAATTCTTAAATTTTATAATGCCACATTTGAGGTCACCTGAAGTCGTGTATAA 

CATGTGTACATTTTTGCGATTTATTTTTTCAATTCCCANATTAAAGGCATAGAGAT 
ATCCTAGCNANGGACTCCAAGTGTG 

Sequence ID j 895 nt : 560 

GTAATTGCAGCCTGGGCAACGGAGTGAGAGACTGTCTCAGGAAAAAAAAAAGAAAA 

AAAACTACTGAGGTAGTTGAATATATCCTCCATTCCCCATTTGTGGATTAGTTAGT 

AAATGGGGCATCTTAGGGTTTAAATATGTCCAGGGTCACTGAGGATCAGATCCTAG 

GGTTCCTTTGACTCAAGGCTTTTGTCTCAGCAAAACGTCACCTTCCAGCAGGAAGG 

CTTTCTCAGGCAAGTAGCAGGGTGGCTACTATGTATCGCTTCTTTATTTTTTCTTT 

TTTAAAATAATGCAGGCACCGTGCGCATAATTTAAAAAATCAGTGCTAAAACCCTT 

AAAAAAAAAAAGCTGTTCTCATCTCCTGTCTTTCTTTTTTTTTTCTTTTTATTTTT 

TTCTTTTATTATTATTATACTTTAAGTTTTAGGGTACATGTGCACAACGTGCAGGT 

TTGTTACATATGTATACATGTGCCATGTNGGTGAGCTGCACCCATTAACTCGTCAT 

TTAGCATTAGGTATATCTCCTAATGCTATCCCTCCCCCCTCCCCCCTTTTTTTTTT 

Sequence ID 896 

GGGAATGTCTTAGGCACTGGGACTGTAAGTGCAAAGACCCTGTGGCACAAGGGAAT 

GTTAATTATCTACCTTTCANAAACTGGAANAAGGCCTAGCCTAGAGCATTGAAAAC 

AATAAGGGAAAGGAGGAGTAAGGCTGGANAGATAGGAATGGTTTAAAGTCTTTGTT 

AAAAATTTTTTTAAAAAAATCTTTATCACT^GAAGAGGATTGGCNTGATCAAATTT 

GACTTTTAAAAANATTACTTGGGTTGGGCATGATCAAATACTACTTAGGGAGATTA 

GTTTANATGATAATGGCATTCTGGACCANAGTGGAGTCAGAGGTGAAAAGAGGTAG 

ATATTCCANAATTGAGGGATTTGTGAGGTGAAATCATTTGTTACAGATATTAAAGG 

ATAAGGAGCTTTGTCAAAGGGGATCTTAAGTTTCTGGTATGGTAACTGGGTTAGAG 

AGCCCTGGAACATGACCAGCTTTAAGGGAAGAGAGCTTGAGCTCTGTTCTTGTTAA 

GCTCAGTTTGAGATCTTTGTGGAATCAAGTGGAGAGGTCTAAGCAGGGAACTGGCT 

TGGCTAGGCTGTAAAGATGAATCTGAGAGTCCCAAGAATATGGTAATTATTAATAA 

AAGCCTTAGGTANATGAAATTGTTTTGGG 



Sequence ID - 897 n t : 509 

GCAAATCTACACATTTGATTAAATGATAGGGAACTATGCACACACATAATAGA't'AT 
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AATGCTAGTTTCTTGGTTTTGATATTGTACCATAGTTATGTAAGATGTAACCATTG 

GGGGAAACTGGGTGAAGGCTACATGAGACCTCTCTGTACTTAATCTTTGCAACTTA 

TGTGAATCTATAATTATTCCAAAATAAAAAGTTTTAAAGAACCTAAGTATCCTTAT 

TACTGAGGGTCATCGTGCTAGACAGCAAGGTTGGGCCAGAGCTTCTAGTTATTTAA 

AATACTAAATACCAGCCTGGGCAACATAGCAAGACCCTGCCTCTACAAAAAGCAAA 

AAAATTAGCTGGGCATGGTGGTACATGCCTGTGGTCCTAGTTACTCTTGGAGGAGT 

CTGAGGTGGGGAGCTTGAGCCTAGGAGTTTGAGGCCGCAGTGAGCCTTGATTGTGT 

CTCTGTACTCCAGTCTGGGCCACAGAGCAAGACCCGGTCTCTAAAAATAAATAAAT 
AAATA 

Sequence ID 898 

ANTGCACTCCAGCTTGGTGACAGAGGGAGACTCCATNTTAAAAAAAAAAAAAAAAA 

AAAAAAAGGGAGTAGCTTGAAGCCACATAGTAGTTAGTGGTAAAGGCCACCCCTTT 

TCCCACAACTCACACCAGCACCACAAGCTAGCCTTTNTAA.TTTCCAAGCCAGTGCC 

CTTTCAACGCACACACCCCTGTGTCAGTTCCCTTTCTGCTGCAAGCTCTCTGGAGG 

CAGATACTGTTGAGTCCCTGGCCTGCCTATGAGAACGGCTCATGATCTCTATTTCT 

TCTGCTTAATGACCATCTCGAAGTAACAAGTTTAGCCTAAAATAAACTTGCTAA.GT 

TAGCAAAGGAAGTCCTTAGCAGCCACCATTTCTCGATTCCTCCATCACCTCCCCTG 

CCCCTCAACTCCCTCATTTCTCCCAAGATATGGGCTCCAGGCTGGGCGCGGTGGCT 

CACGCCTATAATCCTAGCACTTTGGGAGGCTGAGGTGAGCAGATCACTGAGGTCAG 
GAGTTCG 

Sequence ID 899 
TCNTTCGGAACGCGCC 

Sequence ID 900 

CTGGAGGGATGGGTAGGATTTTGACAAGAGTGGTTGAAGGTATTCTAATTCACTTA 
GTACCTACATGTGCGAGGCAGCATGAAGGCAAAAAAGCCTGGGGCATGTTCAGAGA 
ATAGCAAGTATTCTAGTTTGAGTGGCACCTGGTACGTATATAAGGGAATAGTAAAA 
GATCTGGCTGGAAAGGAAAAGTAGGGGCAGGTTACGAAGGACCTCTGAAAGTCAGA 
CTGTGGAACTGGAACTTTTATCAGGAAGCAGTAGTTAGTTTTTTCAAGCAAAAGCT 
AATTAGAGTTGATATTTAGGAGGATGAATCTAACAGTTGTGTGCAAGGATGCCTTC 
AAACTGAGTGAGACTAGTACTGGAGACTGGTTAAGAGACTACAACAATAACCTGAG 
TAAGAATTAATACAGGCCTGACCTAGTTTTGAGTGAGTAGGATTGGAAACAAGAGT 
TTTAGGTATTATAGGATTTATGCATATAAAATGGACTTGACAGAACTTGAAGAAAG 

AGAAAGTGTCAAAAGGACACAGAAAGTGAGGCAGGATATCTTACAATGTTAAAGGA 
AAGGAATAATAGAAGTTAC 
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Sequence ID 903 

GGA^CATAAGCTTGTTTCAGTACACTCACGCTGTAGATTAATTCTGATATTACAT 

ATCTCCATCAGACTTTGTACCCTCTCTCTTCCATCCCTTACCCTTACCGATTAGGT 

TGGTATTACCTAAAAATCCATAGAAAATGTCCAGGTGAATTGCCTTATGCTTTCTA 
CCCCATAAGGTATAATT 

Sequence ID 904 

CTCTGTGGTGTGAGAACACAGTGGGTGACCAAGGCTTTCCAGATGAACCCAAGGAA 

AGTGAAAAAGCTGATGCTAATAACCAGACAACAGAACCTCAGCTTAAGAAAGGCAG 

CCAAGTGGAGGCACTCTTCAGTTATGAGGCTACCCAACCAGAGGACCTGGAGTTTC 

AGGAAGGGGATATAATCCTGGTGTTATCAAAGGTGAATGAAGAATGGCTGGAAGGG 

GAGTGCAAAGGGAAGGTGGGCATTTTCCCCAAAGTTTTTGTTGAAGACTGCGCAAC 

TACAGATTTGGAAAGCACTCGGAGAGAAGTCTAGGATGTTTCACAAACTACAAAGC 

TGAAGAAAATGAAGCCCTATTACTTGTTTGTAAGATTTAGCACCCTTCTGCTGTAT 

ACTGTACTGAGACATTACAGTTTGGAAGTGTTAACTATTTATTCCCTGTTAAAATT 

TAACCTACTAGACAATGATGTGAGTACCCAGGATGATTTCCTGGGGCACAGTGGGT 

GAGGAGATGGGGACAGGTGAATGGAGGAGTTAGGGGAGAGGAAAAGTGGATGGAAG 

TGTCTGGAAAGGGCACCAAAAAAGTCTTCCAGGTCTGATCCTGTTTCTTGCTCTGA 
GTGCTAGCTACCACTGTGTCACACTGTAACATN 

Sequence ID - 905 n t : 555 

CTCAGCTCTTGCCTGGTCACCTTGTGGCTTTTACCATCCTCATCCCCTGTGCCACC 

CACATCCTGCCACTTCTGCATGGAGTTGGGGTGGGGCCATTGGAGAAAAGAGGTTA 

AACAAGCAGTAATTTACTTGAGTACAGTCTTTGAGCCAATGAAATGCCAGTCATCA 

TTTCCCAGGGGTACTTGTCATCTTGTCAACAACCCGCTGATAATGCTCCTTCAATG 

TGAATAGCAAAAGTAGGGAGAGACGCTGAATGAAGAAGATGCCTACCCCTCAGGAA 

GACTGCTGTCCGCCTCCAGGCCTGCATGCACACACCCATGCCCACCTGCACCCCCA 

GCACCACGCCCACACTCACTCGCACACACCCACATGCCAGTGTTTTGGGGTTGGCA 

GCCTGGACACTGCTGAGGCAAACACAAGTCATCAAGCATAATTCTCATTCTCTCCT 

TCTGTCTCTGTTTTAGTTACAGGAATTTGGTCAGTTTAGAGGATTTAATAAGTCCG 

TGGAAAATTTGTTTGTGTCTCTTGCTACCCACGTGAAAAGTAAGTGCATGCTTCAT 

GATGTGTTTTCCCACTACCTTCCAGGCCAGCCGAGCCCACTGGCCANGGCCTGGCC 

CGGTGACCTCGGTTGACACTGTCCTCANGCCACTCACTT 

Sequence ID 906 

CAGAATTTCATGTTTATGCTGCACAAGGCCTGTATTTTATAATGGTGGCTCTTTTG 
GACGATGACTTCCTCGATGGTGAAACTTCCAGTAATCTCCCTCATCATACTGAAAT 
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GATATCAGTATATCATCAGAACACCATGGAGCTTGTCATTTGAGGGACACAGCTTG 
CTTGTGTGCTTGGGAAAGAAGAGGTTTAGCATGGTTTCAGGTCAGTGATGAGTCCA 
ATGATCTCTGCAAGTTCCCTTAGCTCTGANAATTCTGATGTCATATGCACTTCTGC 
CGCCAGAGTTGCTGCTTACTGGATGCGTAAGAAGAAAAGAAAAAAAAAAAAAAA 

Sequence ID - 907 . 

nt: 582 

CTTCCATTGGGGGTAAAGATCAAACTTTAGGCGAGCCAGGTCTGTATCTCCATTCC 

TGTCTCTGACTGCTTCCCTGTAGGGATTGTCTGCAAGCGCACACCTGCATTTTCTT 

GTCCACAAGTCTATGCTCTAACTCTGTCACCTGCATGGCTGCAAATTAGCTTCCTT 

CTTCCTGCCCTCTTCTCTCTAGCTTGGATTTTGAATTTGAATGGCAGGCATGGGAT 

GTCCGTGTGTGTGTACTGCTGATGTGTACAGCCGCTTGTTAGCGCTCTCATTGTCT 

TCAAATGTAAGTCATTTTGGCTGGGTGCGGTGGCTCTITGCGTATAATCCCACGCTT 

TGGGAGGCTGAGGTGAGCTGATCATTTGAGGTTAGGAGTTCGAGACCAGCCTGGCC 

AACATGGCAAAACTCCATCTCTACCAAAAATACAAAAATTAGCTGGGTATGGTAGT 

GCACGCCTGTAATCCCAGCTACTTGGAATGCTGAAGCAGGAGAATTGCCTGAACCC 

ANGAGGCGGAGGTTGCGGTGAGCCAAGATCACGCCACTGCACTCCAACCTGGGTGA 
CAGAGCAAGGCTGTGTCTCAAA 

Sequence ID 908 

ACCTGACTTCAAACTATACTACGAGGCTACAGTAATCAAAACAGCATGGTACTAGT 

ACAAAAACAGACCAATGGAACAGAATAGAGATCTCAGAAATAAAACTGCACATCTA 

CAACCATCTGATCTTCAACAAACCTGACAAAACGAGCAATGGGGAAAGGATTCCCT 

ATTTAATAAATGGTGCTGGGAGAACTGGCT-AGCCATGTGCAGAAAATTGAAACTG 

GACCCCTTCCTTACACCTTATAC^AAAATTAACTCAAGATGGATTAAAGACTTAAA 

TGTAGAACCC^yVAACGATAAAAACCCTAGAAGAAAATCTAGGCAATATCATTAAGG 
AG&TAGACATGGGC^AAAATTTCATGATGAAAAC^TC 

GCAGAAACTGACAAATGGGCTTCTGCACAGCAAAAGAAACTATCGTCAGAGTGAAC 

AGACAACCTACAGAATGGGAGACAGTTTTTGCAATCTATCCATCTGACAAAAGTCT 
AATATCCAGAATCTACAAGGAATTTAA 

Sequence ID 91 o 

CAAAAAACAAGAATTACCCGGGCTTGGTGGTGCATGTCTGTAGTCCTATCTACTCA 
GGAGGCTGAGGCTGAAGGATCACTTGAGCCCAGGAGTTTGAGGCTGCAGTGAGTGA 
GCCATGATCATGCCAGTGTACTCCAGCCTTGGCAGACTGAGCAAAACTTGGTCCCT 
CGCAAAATGTTGAAGCCCAGTTTTCACTATTAACCTGTATTTCAGTTTCCCCATGC 
TAACTTTGAAACACTGGGGCTGGCCTGAGGGTATAAAGGCTTATTCAAACTCAGTA 
ATTTAAACTTAAAATCCTAAGGAACTTCAAAAAGTGTAATCTAGTCCAAATGGGGC 
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ATCAATTCTAAAGC^^^ 

ATAACTTATCTTTTTATGACTAAATCCAAGTCCTTAGTTCCTGTTGGAaTT^^^^A 

TTGTATAAATAA 



TCATATTTAAAAATTGATGCTTTGTTCTATAATTAATGCTTTGA' 



CTGGNTCCCAGTCCTGTNCTTAAAATTCTAACTCGAC -TCAGAACTAC 
Sequence ID - 911 

nt : 595 

GAGGGTGTAGAAGAGAAGAAGAAGGAGGTTCCTGCTGTGCCANAAACCCTTAAGAA 

AAAGCGAAGGAATTTCGCAGAGCTGAAGATCAAGCGCCTGAGAAAGAAGTTTGCCC 

AAAAGATGCTTCGAAAGGCAAGGAGGAAGCTTATCTATGAAAAANCAAAGCACTAT 

CACAAG^TATAGGCAGATGTACAAANCTGAAATTCGAATGGCGAGGATGGCA^G 

AAAAGCTGGCAACTTCTATGTACCTGCAGAACCCAAATTGGCGTTTGTCATCAGAA 

TCAGAGGTATCAATGGAGTGAGCCCAAAGGTTCGAAAGGTGTTGCAGCTTCTTCGC 

CTTCGTCAAATCTTCAATGGAACCTTTGTGAAGCTCAACAAGGCTTCGATTAACAT 

GCTGAGGAT^TAGAGCCATATAITGCATGGGGGTACCCCAATCTGAAGTCAG^ 

ATGAACTAATCTACAAGCGTGGTTATGGCAAAATCAATAAGAAGCGAATTGCTTTG 

ACAGATAACGCTTTGATTGCTCGATCTCTTGGTAAATACNGCATCATCTGCATGGA 
GGATTTGATTCATGAGATCTATACTGTTGGAAAAC 

Sequence ID - 912 

nt: 651 

-CATTTCCAGAGTTTATGTGAATTGAATTGAACTATGGTTTTATGTTACTGTCAGTA 

GAATGAAGTACGAATATTTGAAAAATACACCTTCAACTTCAAAGTGATTCTTGACA 

AAAATTATAAGGAATCATTTTGGACACATTTTCTGGTAGAGCCTTGTAAAAATTAA. 

AACCAAGTGTTGTTTTCAAGAAGAACTGTAATACATAATCAGGAATTTGAGTAGGG 

AGATTATTTTGTTATTTAAAATTAAAGTGGCTGTGTAGTTTTAACTTTAGTATTGC 

AGGTAGAGTAAGCTTACATGATAACAAAAATCTTGGTCTTAGTGACTTAATGATTC 

TGATATTTATTGATTGATTGGTTATCATTCCAAATATTTTAAAAGATAATAGCTGG 

CTGGGTGCGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCAGGACGGGCG 

GATCACGAGGTCAGGAGATCAAGACCATCCTGGCTAACACGGTGAAACCCCGTCTC 

TACTAAAAATCAAAAAATTAGCCGGGTGTAGTGGCGGGCACCTGTAGTCCCAGCTA 

CTCAGGAGGCTGAGGCAGGAGAATGGCATGAACCTGGGAGGCGGAGCTTGCAGTGA 
GCTGAAATCGTGCCACTGCCTCCACCTGGCGACAA 



Sequence ID 913 

GTGAGGTGGGGACTTCATTCATTGTCCTATTTCTATCTCCACTTTGTGCCTGGAGA 
GCTTTCAGGGGAGGTGGAGGAGGAGGGTCTGCCAAGCTACTGCAACATCTGTCACC 
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CACTATACCCAGTTACTTGGGGGAGGACAGACACTGTGGTGTCATTAAAGTTGTTT 

GAACCAAAGTGGCGGCTGCATCTTTGTCCCGATGCTAGCCGTGCCGGTCTCCCATC 

ATCCGCTCGCCCTCCTTTNCCCTGGGCTGCGCCCACTTGTCTTCCTGGATATTTGG 

GGGTGACTCGCCATGCTTGGCACCCTCTGCTTCCTGGTGCTGCTCTGACTCGAAGA 

5 CGGGACAGTCCCTGGTGCACATCCAGGGAAGAGGAGTGTCGGTAGTTCTTGCAGTA 

GGCACTTTATCAGGACCTGACCTGTTGCTGGGTGATTTTAGTCTCTACAAACAGAA 

AGCGTTTCAAAGCGTCAGCTGTGGGAGCAGAGTGACCCTTTGCTGATGCTGGGGGG 
AGGGGATCTAAATCCTCATTTATCTCT 

10 Sequence ID 914 

GGCGCCTGCTGGAGGAGGAGAGAGCTCTGCTGGCATGAGCCACAGTTTCTTGACTG 
GAGGCCATCAACCCTCTTGGTTGAGGCCTTGTTCTGAGCCCTGACATGTGCTTGGG 
CACTGGTGGGCCTGGGCTTCTGAGGTGGCCTCCTGCCCTGATCAGGGACCCTCCCC 
GCTTTCCTGGGCCTCTCAGTTGAACAAAGCAGCAAAACAAAGGCAGTTTTATATGA 

1 5 AAGATTANAAGCCTGGAATAATCAGGCTTTTTAAATGATGTAATTCCCACTGTAAT 
AGCATAGGGATTTTGGAAGCAGCTGCTGGTGGCTTGGGACATCAGTGGGGCCAAGG 
GTTCTCTGTCCCTGGTTCAACTGTGATTTGGCTTTCCCGTGTCTTTCCTGGTGATG 
CCTTGTTTGGGGTTCTGTGGGTTTGGGTGGGAAGAGGGCCATCTGCCTGAATGTAA 
CCTGCTAGCTCTCCGAAGCCCTGCGGGCCTGCTTGTGTGAACCGTGTGGACAGTGG 

2 0 TGGCCGCGCTGTGCCTGCTCGTGTTGCCTACATGTCCCTGGCTGTTGAGGCGCTGC 
TTTAACCTGCACCCCTNCCTTG - CTCATANATGCTCCTTTTGA 

Sequence ID - 915 nt : 23 0 

TTTGAGACCAGCCTAGCCAACATGGTGAAACCCCATCTCTACTAAAAATACAAAAA 
2 5 TTAGCCGGGCGTGGCGGCACATGCCTATAATCCCACTTACTTGGGAGGCTGANGTA 
GGAGAATCGCTTGAACCCANANAGGCAGAGTTTGCAGTGAGCCGAGATTGTGCCAT 

TGCACTCCAGCCTGGGCGACAGAGCGAGACTCCATCTAAAANAAAATAAATGAATA 
AAATAA 



30 



Sequence ID 917 



35 



TTAACTTTCCATGATATGTATTTTTTATACATTGCTGGATTTTATTTGCTAATATT 
TTACTTAGGATTTAATTTTCTAAGTNGACCTATAATTNTCCTGTATAAAATTGCAT 
TTGTCACATTTTAGTATCAAGGTTGTCCTANCNCCATGAAATGGATTTANAATGGT 
TTATGTAANATAAAGTACATTTCTTCTAAAGGTTTGNGTGGATTAACTTTCAAATC 



ANCTTTTCAAATNCTGATTTAATTTTTAAAATATTTNCAAGTNTNTTTANAGTTTT 
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TATTTNTTNTNGAANGTTAACATTTTTATANAAAANGGTNTTATCTTTTTAAATTC 
TTTGACATCAGTTTCTTCANAATTCCTTCTTTTAA 

Sequence ID 926 

GTCATATCTCTTCCCAGGGAAAGCAGGAGCCCTTCTGGAGCCCTTCAGCAGGGTCA 

GGGCCCCTCGTCTTCCCCTCCTTTCCCAGAGCCATCTTCCCAGTCCACCATCCCCA 

TCGTGGGCATTGTTGCTGGCCTGGCTGTCCTAGCAGTTGTGGTCATCGGAGCTGTG 

GTCGCTACTGTGATGTGTAGGAGGAAGAGCTCAGGTAGGGAAGGGGTGAGGGGTGG 

GGTCTGGGTTTTCTTGTCCCACTGGGGGTTTCAAGCCCCAGGTAGAAGTGTTCCCT 

GCCTCATTACTG(SGAAGCAGCATCCACACAGGGGCTAACGCAGCCTGGGACCCTGT 

GTGCCAGCACTTACTCTTTTGTGCAGCACATGTGACAATGAAGGACGGATGTATCA 

CCTTGATGGTTGTGGTGTTGGGGTCCTGATTTCAGCATTCATGAGTCAGGGGAAGG 

TCCCTGCTAAGGACAGACCTTAGGAGGGCAGTTGGTCCAGGACCCACACTTGCTTT 

CCTCGTGTTTCCTGATCCTGCCTTGGGTCTGTAG 

Sequence ID 938 

TGGCCATCCTTTTCCCCCCAAACACACCCCCTTAACCTATCTCTTGGGACTTAGCC 
CGACCCTCCCTCTCATTTCCCATTAAGTCTGAGAGGCAAGAGCTAGGTTAGGCAAG 
GAGGTGGTTGGCCAGAGATGGGGAACAGCCAGGTGCCCCAGTCCTCTGATTTTTCC 
TCCATCCTGCTTACCACCTCCCTGGGTACTTACAGCCTTCTCTTGGGAACAGCCGG 
GGCCAGGACTGGGTCACCTATGAGCTGAATCAGCATCTCCTCCTGAGTCCCAGGGC 
CCCTGCAGTTCCCAGTCTCTTCTGTCCTGCAGCCCTTGCCTCTTTCCCACAGGTTC 
CACTTTATATCCACCTTTTCCTTTTGTTCAATTTTTATTTTTATTTTTTTTATTAT 
TAAATGATGTGGTCTATGGAAAAAAAAATAAAAATCTGACTTAGTTTT 

Sequence ID - 939 n t : 513 

GGAACCCAGTGTATTACCTGCTGGAACCAAGGAAACTAACAATGTAGGTTACTAGT 

GAATACCCCAATGGTTTCTCCAATTATGCCCATGCCACCAAAACAATAAAACAAAA 

TTCTCTAACACTGCAAAGAGTGAGCCATGCCTGTTAACACTGTAAAGAATGTAACA 

TGTGGGGGACACACAGGGGCAGATGGGATGGTTTAGTTTAGGATTTTATTAGTGCA 

TGCCCTACCCTCTGGGGGAACGTCCCATCTGAGGTTTTCTTCTCGGTGGGGGGATT 

TAACTTCTGTCCTAGGGAAAACAGTGTCTGATGAGGAGTGTTTCCAACACAGGCTA 

CATGAATTCCCCTATACCAGTGCGAAAGCAGCCAGGAGTCCCCGTTGGAAAAGAAC 

AATGCCACTCTCTTTTATGTATCTTGGTTCTGCAACTCATTTGTTGTAAGTAGGGT 

TAATCGAGTATCAGGTTC^CAGTATCCTGCCCTTATTATTTTATGATTCACTGACT 
CAAGTTCCA 
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Sequence ID 947 

GAGAGTGAAAAAATTCTGGTACAAATTGGGAAATTAGTATATAACAACATAGTGTT 

AAATTCAATGGGAAAAGTTTAATAAGAGGATTTGGTATCAACTGGCTGTCCAAAGA 

TAAAAATGGACCGTCCTATCACATACAAAATTGTTTTTTAGATAAAGATTTAAATA 

CAGGCACTCCTTCATTTGCGTGGTGCACCTTGAGGTGTTGCAGAAATGATGAGAGC 

TGAAACTGCAAAGCAATTTTAATACTTTATCTGTTGGAAATCTTATAGTTTTCCTG 

TGACCGTTAAAATTTTCATTAAACTATTAAAAACACCCATGACTGGTCACAAATGT 

ATTGGGAAATGGAAAAGAATTAATACACTAAAAATACAAAAAATAGAAAATATTTA 

AAATTATCTAAAAATTTGAAACATTAGAAAAATTGAGAACTAGGCAGGGCGTGGTG 

GCTCACATCTGTAATTTTAGCCCTTTGGGAGGCTGANGCAGGTGGATCACCTGANG 

TCAGGAGTTCGAGACCAGCCTGCCAACGTGGGGAAACCCCGTCTCTACTGAAAATA 

CAAAAATTANCCGGGCATGGTGGCACAAGCCTGTAATNCTTGCTNACCAGGANGCT 

GAGGCAGGAGAATCACTTGAACCCANGANG 

Sequence ID 949 

GTTTCACATGAGAAGGTAGTATTATGTACAGTGACCTTGTTTAAAGTGTCNGTTTA 
ATGTTACCACTAAGGCCCTGCCCCAGCTTTATCACCTGAGCACTAACAAGTGCTGT 
GTGGAGTTCAGTCCATGCTGGTAACTNTTGAGTATTCAGTGGGTCTTTTAACAATT 
ACCACCGTGGAGGANANAGCAAGGAAGAGAAATGCTGTGATCTTTTNCTGTTTTTA 
ATTAGNGAAAGAGGGATTANATTAAACAAATGTTACAGAGNTGTGACTNTGATCCC 
CCAGNGGTAAGCAATAATTGTANAGACTGGATTTNANAAGCCCTGAGAGTTTATTT 
TCAACCTAT3STTATTATAGNNCAATCC 

Sequence ID 1028 

ACAAGGCTTGGGGGCTGGACTCCCTCTACTGCCTCTGGCCATACCCCCTCCTGGAG 
ATGGGGTCAAGGCACCAGGACTGA 

Sequence ID - 1056 nt : 435 

TCGCTTGTAAAGCCTGAGACAGCTGCCTGTGTGGGACTGAGATGCAGGATTTCTTC 

ACACCTCTCCTTTGTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCATCT 

GAATGTGTCTGCGTTCCTGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCC 

CCGTGTCCACCGTGACCCCTGTCCCCACACTGACCTGTGTTCCCTCCCCGATCATC 

TTTCCTGTTCCAGAGAAGTGGGCTGGATGTCTCCATCTCTGTCTCAACTTCATGGT 

GCGCTGAGCTGCAACTTCTTACT^ 

TGTTTTCTCAAATATTTGCTATGAAGGGTTGATGGATTAATTAAATAAGTCAATTC 
CTGGAAGTTGAGAGAGCAAATAAAGACCTGAGAACCTTCCAGA 
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Sequence ID 1071 

NGATATAGTNCCGC^TGGGAAAGATGANCAGGTATAACCNAGC^mATATAGCAAr 

GACTAACCCCCCTGCCTTCTGCATAATGAATTAACTAGAAATAACTTNGCAAGGA(3 

AGCCAAAGCTAAGACCCCNGAAACCAGACGAGCTACCTAAGAACAGNTAAAAGAGC 

ACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGGCGACAAACCT 

ACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTTAAATT 

NGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCAAAGAGGAA 

CAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTAACACCC 

ATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCACT 

ACCTAAAAAATCCCAAA.CATATAACTGAACTCCTNACACCCAATTGGACCAATCTA 

TCACCCTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGC 
ATAAGCCTGCN 



Sequence ID - 1074 nt . 6Q9 

GGGAGGCGGAGGCTGCAGTGAGCTGAGATCGTGCCACTTCATTCCAGCCTGGGCAA 

CAAAGCGAAACTCTGTCTCAAAAAAAAAAAAAAAAAAAATTTGTTGACTGTTGTAA 

TTTAAAGCTTGTCATTTTTTATTTAGTAATAACACTCATTAGTGTAGTATCTATGA 

TGAACCAGGTTCTGCACAAAGTACCTTATGTTCATGGCCTCATATCGTCTTCTCCA 

AAACTCTGCAAGATAGGATTCATCACCACTTATAGGGAGAGATCTGAAAGTTTAAA 

ATTGTACCCAAGGTCACACAGCTGGTAAGTGCCAGAGCTGGGATTCCGTAGGGTGT 

TCANAGTGCCTCTCCTGCCGTAGGCTTATCACAAAAAGTCAAAGTTTGGTCATAAT 

AAAGCCTGAAGTTTGGCAGGATTTAAAAATAGTCACCANACTTTTGAGTTGGAGCA 

TCCCACCTC^CTGCTGTTCACCTTCTGTGGCAGGGAGAGTCATCATTTCCATTTCA 

GCTTGTGGAATATCTTGTCATTAACATTCTCATGCAAAAGCCATTTTATGGTGCCC 

AATGAANATGGTTAAGOTACTGCCCCAAGCCTNTGGAAGCCTTCCTAATTTTGGAC 

TTGCACTATGCAAATTGNATAATATTTTCTCTACCCTAAGCCAAATATTTTCTTCA 
CTTTTCATTCATTCTAC 

Sequence ID 1081 

CGCCGCCGCGCCGCCGTCGCTCTCCAACGCCAGCGCCGCCTCTCGCTCGCCGAGCT 
CCAGCCGAAGGAGAAGGGGGGTAAGTAAGGAGGTCTCTGTACCATGGCTCGTACAA 
AGCAGACTGCCCGCAAATCGACCGGTGGTAAAGCACCCAGGAAGCAACTGGCTACA 
AAAGCCGCTCGCAAGAGTGCGCCCTCTACTGGAGGGGTGAAGAAACCTCATCGTTA 
CAGGCCTGGTACTGTGGCGCTCCGTGAAATTAGACGTTATCAGAAGTCCACTGAAC 
TTCTGATTCGCAAACTTCCCTTCCAGCGTCTGGTGCGAGAAATTGCTCAGGACTTT 
AAAACAGATCTGC^CTTCCAGAGCGCANCTATCGGTGCTTTGCAGGAGGCAAGTGA 
GGCCTATCTGGTTGGCCTTTTTGAAGACACCAACCTGTGTGCTATCCATGCCAAAC 
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GTGTAACAATTATGCCAAAAGACATCCAGCTAGCACGCCGCATACGTGGAGAACGT 
GCTTAAGAATCCACTATGATGGGAAACATTTCATTCTC 

Sequence ID - 1083 nt.- 193 

GCGCGTCGACTTTGTTTAGACATTGAATGACTTTGTTAAAGGCACAATTAATCACA 
TTGGTTGTACTCTGNNGACAGCOTTCTTTAAAAAAAAAATAAACAATTTAAAACAA 

AAAAAAAAAAAAAAAAAAAAANTTTTAACC 

Sequence ID -.'1084 nt . 198 

GCGCGTCGACTTTGTTTAGACATTGAATGACTTTGTTAAAGGCACAATTAATCACA 
TTGGTTGTACTCTGNNGACAGCCTTCTTTAAAAAAAAAATAAACAATTTAAAACAA 

■kAA^^AAAAAAAAAAAAAAAAlSTTTTTAACC 

Sequence ID - 1099 nt . 561 

TGCATGCTTGTGGATTGGAAAAACTTTGGAGACTGATTACTTTTCATTATATATGT 

GTCACAGTGAAACAGCTTTTATGTGTCATGTAAGATTACTGCTTGCCTCTCTAAGG 

AAGGTCGTGACTGTTTAAATAGACGGGCAAGGTGGAACCTTTTGAAAGATGAGCTT 

TTGAATATAAGTTGTCTGCTAGATCATGGTTTGTATTGAACTAACAAGGTTTGCAG 

ATCTGCTGACTTATATAAAGCTTTTTGATTCCTACTAAGCTTTAAGATTTAAAAAA 

TGTTCAATGTTGAAATTTCTGTGGGGCTCTATTTTTGCTTTGGCTTTCTGGTGAGA 

GAGTGAGGAAGCATTCTTTCCTTCACTAAGTTTGTCTTTCTTGTCTTCTGGATAGA 

TTGATTTTAAGAGACTAAGGGAATTTACAAACTAAAGATTTTAGTCATCTGGTGGA 

AAAGGAGACTTTAAGATTGTTTAGGGCTGGGCGGGGTGACTCACATCTGTAATCCC 

AGCACTTTGGGAGGCCAAGGCAGGCAGAACACTTGAAGGAGTTCAAGACCAGCGTG 



Sequence ID 1109 

TTTGNCGGTNTTGGANNNIWANAA^ 

AATTAANATGGNTTTNGNGGGTTCNTTNCT^ 

NTCMTNCl^TTCCTTmcCCTNAANCTACCTTCCCCCNATTTTCTCCCCTNTTCN 

TNAATTANCATCCTCTCOTCNTANNTCNANACNTTAATGGCAANACTATCTAATAN 

CNANNATAANANCTCCTGTNNNCCACATNTCTTATTNNNCGCl^CANGTTNCANNC 

CCNCAGAGTNAACTCATCCTCNNCl^AANTTCATATCGTGNNCTNTNNNCNNTNGC 

GCGANATATTAAIWANACC^GTANNTNmANACANNANirraNGNAANAANCCTTCT 
NAIWTTTTAGCNTCNNGCNNTAAC^^ 
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ATNATN CTNC3SINCGAANTNTCANN CNTCTCCNCTTNAATGNNTTCCCATGNATTAA 

NTNCCTCGlSnsnSTAN^^ 

TGGTCCANTISnNTCGTTNGNCG 

Sequence ID 1118 

GGATTTTAGAGGAAGGCGCTNGGTTACATTGGAGAACTGGAGTGGTCTGGAGTTCC 

ACGGTGTAGTGGACCAGAGGCCACCTCTCCTGGGCTTCTCAGTGTCTCGCCGGCGG 

GGTTCGGCCTGAGCTGGATTGACATAGCCCTTGGCGGATTTAAACAACCTAAACAT 

TAAGCAGTACAGCTGCCTCAAACCTTTGGGATTTTCAGAATGACTGACACTGCCGA 

AGCTGTTCCAAAGTTTGAAGAGATGTTTGCTAGTAGATTCACAGAAAATGACAAGG 

AGTATCAGGAATACCTGAAACGCCCTCCTGAGTCTCCTCCAATTGTTGAGGAATGG 

AATAGCANAGCTGGTGGGAACCAAAGAAACAGAGGCAATCGGTTGCAAGACAACAG 

ACAGTTCAGAGGCAGGGACAACAGATGGGGGTGGCCAAGTGACA^^ 

AGTGGCATGGACGATCCTGGGGTAACAACTACCCGCAACACAGACAAGAACCTTAC 

TATCCCCAGCAATATGGACATTATGGTTACAACCAGCGGCCTCCTTACGGTTACTA 

CTGATAGAAATGTTGGCAGCTTTTAGTAAAAGCATTTACTCTGTTACCATGAGAAA 

Sequence ID 1125 

NGACTGGCTCCCGAAAAGAAGGGTGGCGAGAANAAAAAGGGCCGTTCTGCCATGGA 

CGAAGTGGTAACCCGCGAATACACCATCAACATTNACAAGCGCATCCATGGAGTGG 

GCTTCAAGAANCGTGCACCTCGGGCACTCAAAGAGATTCGGAAATTTGCCATGAAG 

GAGATGGGAACTCCATATGTGCGCATTGACACCAGGCTCAACAAANCTGTCTGGGC 

CAAAGGAATAAGGAATGTGCCATACCGAATCCGTGTGCGGCTGTCCANAAAACGTA 

ATGAGGATGAAGATTCACCAAATAAGCTNTATACTTTGGTTACCTATGTACCTGTT 

ACCACTTTGAAAAATCTACAGACAGTCAATGTGGATGANAA 

CAGATCAAANAAANT 

Sequence ID - 1139 nt : 503 

CAGCACTGCCAGTGGAGATGGGCGTCACTACTGCTACCCTCATTTCACCTGCGCTG 

TGGACACTGAGAACATCCGCCGTGTGTTCAACGACTGCCGTGACATCATTCAGCGC 

ATGCACCTTCGTCAGTACGAGCTGCTCTAAGAAGGGAACCCCCAAATTTAATTAAA 

GCCTTAAGCACAATTAATTAAAAGTGAAACGTAATTGTACAAGCAGTTAATCACCC 

ACCATAGGGCATGATTAACAAAGCAACCTTTCCCTTCCCCCGAGTGATTTTGCGAA 

ACCCCCTTTTCCCTTCAGCTTGCTTAGATGTTCCAAATTTAGAAAGCTTAAGGCGrG 

CCTACAGAAAAAGGAAAAAAGGCCACAAAAGTTCCCTCTCACTTTCAGTAAAAATA 

AATAAAACAGCAGCAGCAAACAAATAAAATGAAATAAAAGAAACAAATGAAA 

TATTGTGTTGTGCAGCATTAAAAAAAATCAAAATAAAAATTAAATGTGAGCAAAG 
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Sequence ID - 1148 nt : 587 

TGAAAAATAAAGTTTTTATGTATATTCTACATATGTATATGTTGGTAGAAAGCAAA 
AACGCTAGGTAAAAATAAATGTAATACAATTTTAGCTATGAACCAAAAAACCATTT 

gtggtgtggatgcaagaaagtctggatgggtgcagagttctccatgtttcacttct' 

GACATTTGAAAATACGCAGTTTGCATTTGATACGTCAAATGTTATTTTTAAGAAAA. 

CCAATAAAATCATTAAAACCGAAAAGGCAGTTTTGCTTGTTTTTACCTTAGTTGGA 

GTTATCTGCAATTGCCGTATTAGTGTTTTAAGGAACTTGTAAGTAAGCTCCTTAGT 

CCCCTTTAGAGCTACGAAACATGTCAATTTTACTTTTCTCCAGCTTTTTGGAATCT 

TATCTAAATTACCATGTAGAGTTCTGCATAGCTTCAAATTCTCTTAGCCAATGTGG 

TCTGTAAGTGTCTATCGATGAATTTCACCGTTAATTGCCGTAGTATACTGTCCTGT 

ACCGGATGTGAAGAGGAGCAACTCTGCACAGTGCACTGGTTGCTCCCATGGTAGGA 
ANGAATGGCTTATCAATGGTCGGATTT 

Sequence ID - 1160 n t : 650 

GGAGGATGGAGCAGTGAGCGGGTCTGGGCGGCTGCTGGCAGCGCCATGGAGACGGT 

ACAGCTGAGGAACCCGCCGCGCCGGCAGCTGAAAAAGTTGGATGAAGATAGTTTAA 

CCAAACAACCAGAAGAAGTATTTGATGTCTTAGAGAAACTTGGAGAAGGGTGAGTG 

TAAAGAAACTATAGGTAGGTCATTGGGTCCCAGTCTTTTTCCTGCCCCAGAAGAAG 

CAGAAGGATATGAACCTTTCAGCATTGTTCTAGGTGGGGTGGAAGGTAAATTTACA 

GCTTGTGATGTCCTTCTTCGCTTTACTCCAATCCCTATTATAGACAGATTTAGTGA 

TTCCTGGTCTTTTTAACACGAAGAATATCTATTGTTTTCTCTTTTGTAGGATCTGT 

ATGATTTTATCTACTTAACAGATAGCACTAATTAGATTAAAATTCTATAAGAAACT 

TTTTAATTTGCTGTTCATAATTTCTGATTGGTATGCAATAACTGTTTCAATGAAAA 

TCAATGTAATTTAGTATTTTAATATTTGCACCTTTGTGAAATATAGTAAATAAATT 

AAGCACTATCACCACCTTCACAGCTACTTAGGAGATCCACAATCCTGGGTTGGGAG 

CCAGTGGATTTCCTGAAACACAGATTTGTTAATG 

Sequence ID - 1165 nt : 502 

CTCAAGTGAATCCTGGCTTCTTGGAAGCGCTTGCCTAGACGAGACACAGTGCATAA 

AAACAACTTTTGGGGGACAGGTATGTTTTCTTGCAGCTGCGGTTGTAAGGTCTTGG 

CAAGACAAGCAGTGTGGCCAGAATTTTGAACTTCTGATGAATGTGTAATGCAAAGG 

ACCTTGTACATTTTTTTGTTTCAAGGTCCTCAAAATGAGCACATGAAGAGGTTGCT 

GTGAAACTTTAAGTGGCCCTACTGCGCAGAAGCATTCAGATGTCACTTGATGATCT 

GTAAGGGAACTTGCTGATTTGGGAATGTGCTTAGGGAACACACATTCCTTTTGACA 

GGGTCTGTCACTGGGTGGGTGATGAATTATACAGATGACATGTGCTTTTTTTTCTT 

TTTTCAACCTCAATGGTATTCCTACAGGAAATGGATAACCATTTTAACTGTATTTT 

TTGCAGCCCGTACCTTCTTGGGAATACAATTGTCTAACTTTTTATTTTTGGTCT 
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Sequence id - 1172 

TCSQATTCCATTTCCCACTTAAA^^ ^ GCAATAATAACC ATGAATCTAAA 

^cacaaagt^ttgtataatgat^ttcaattatI^^^ 

ATCCTCCXTC^CAATTOAGAAAGGA^A^™™™ 1 

ACCCAA^CCAGAGCATATAAATA^TAAAGGAAGA^AAAGlGl^r^ 
^TAAATATAACAATTATAAATATTTTATCTAA^f TC CTGTGATCA 

^-^ATAATAGGGTGGTGACATTAACAPnr^^^^^^^^^^^^^^*^^^^^^^^^^ 

gggagaaaaagct™^^^^^^™-- 



K3CCAT 



Sequence ID 1178 
ATTGTGTTGGCCACCO 
GCCTCTCGCAAAGGATCTCCTTCAT 



attgtgttggccacccgggaattcgcggccgcgtcgacctacgcacacgagaacat 

ccctctccagaagaggagaagaggaaac 

GGTGAGGAGACGG^ T ,^™™!^ raCCTACn ' CaTG < a ^<^ TO CCCA 



AG^GAAA C GCCTGG TO «GA S CCCCAATTG CT AC^CA^r^ ^ 
GGAGAGGGCTTGCTGTAGTGGGGAl^^r GflAATCCCCA 
TGTTGTAGTGTTAGCTCT^r!^ ^ ACCT ^ raGTTTOA ^ 



TCATCACATn 
3ATGATGAGTT 
TATCTGCTTTT 

GTGTGTTGGCTGCTCCACTGTCi ~ — — «^ACGGTAGTTTT 



CTTAGGAGGAGTGATTCATTTCACCGTGATCTCTCATCACATTT 
ITGTGTTGGGAAA 

3GTGTAATAGAGGL 
TTCAGGATGCTATAAAATCACCACGGTCTTTAGCCATGCACAAAC 



CACATACAACCCCTACGTTTTTTTG' 



GGGCATAAGTGCAGGAAAGACGGGTGTAA- 



'TGTTGGGAAACAATGTAATGGATGATGAGTT 



CAGAAGGATGTTCCTTCAGGAGG^n AG ^^^ 
Sequence ID - nso 



CCGTGCAGCACTAACGTATTGGCACCTGCCTCCTCT' 



VGTCTCGGTCGGTAAGGGAAGTCTTCCAAGT 

CTGCCTCCTCTTCGGCCACCCCCCAGATG 
CCACGACTCTrsa r>r«a atTv nm/invn 

TCCACTGCCGTCTCCACAGGAAACO 



GCAGCTGTGACTGTGTCAAGGGAAGO 



TCAAGGCATTTATTGCAGTGTACTATTO 



TCGGCCACCCCCCAGATGAG 
CACGACTCTGACCATAGTCTTCTCTCAGCT 
'CAGAAGTTCTGTGAACAAGTCCATGCTGCCA 



GCTTCCAAAGGATCAGGCCCTGAGAACA 
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ATGACCTTATTTCCTACAACAGTGTCTGGGTTGCGTGCCAGCAGATGCCTCAGATA 

CCAAGAGATAACAAAGCTGCAGCTCTTTTGATGCTGACCAAGAATGTGGATTTTGT 

GAAGGATGCACATGAAGAAATGGAGCAGGCTGTGGAAGAATGTGACCCTTACTCTG 

GCCTCTTGAATGATACTGAGGAGAACAACTCTGACANCCACAATCATGAGGATGAT 
5 GTGTTG 

Sequence ID - 1181 nt : 155 

CGCCACTTATCCAGTGAACCACTATCT^CGAAAAAAACTCTACCTCTCTATACTAAT 
CTCCCTACAAATCTCCTTAATTATAAGATTC^^ 

10 AAAAAAAAAAAAAAAAAAAAAAA?yy^AAAAAAAAAAAAAAAA 
Sequence ID 1182 

CATTGTGTTGGCNCCCGGGAATTCGCGGCCGCGTCGACTTTTTGTGTTGTTTGGAG 
C^GAAATACTAAAGAAGATTCCGGGCCGAGTATCCACAGAAGTGGACGCAAGGCTC 

1 5 TCCTTTGATAAAGATGCGATGGTGGCCAGAGCCAGGCGGCTC ATCGAGCTCTACAA 

GGAAGCTGGGATCAGCAAGGACCGAATTCTTATAAAGCTGTCATCAACCTGGGAAG 
GAATTCAGGCTGGAAAGGAGCTCGAGGAGCAGCACGGCATCCACTGCAACATGACG 
TTACTCTTCTCCTTCGCCCAGGCTGTGGCCTGTGCCGAGGCGGGTGTGACCCTCAT 
CTCCCCATTTGTTGGGCGCATCCTTGATTGGCATGTGGCAAACACCGACAAGAAAT 

2 0 CCTATGAGCCCCTGGAAGACCCTGGGGTAAAGAGTGTCACTAAAATCTACAACTAC 
TACAAGAAGTTTAGCTAGAAAACCATTGTCATGGGCGCCTCCTTCCGCAACACGGG 
CGAGATCAAAGCACTGGCCGGCTGTGACTTCCTCACCATCTCACCCAAGCTCCTGG 
GAGAGCTGCTGCAGGACAACGCCAAGCTGGTGCCTGTGCTCTCAGCCAAGGCGGCC 
CAAGCCAGTGACCTGGAAAAAATCCACCTGGATGAGAAGTCTTTCCGTTGGTTGCA 

2 5 CAACGAGGACCAGATGGCTGTGGAGAAG 

Sequence ID - 1183 nt : 479 

CGTGGCAGCCATCTCCTTCTCGGCATCATGGCCGCCCTCAGACCCCTTGTGAAGCC 

CAAGATCGTCAAAAAGAGAACCAAGAAGTTCATCCGGCACCAGTCAGACCGATATG 

3 0 TCAAAATTAAGCGTAACTGGCGGAAACCCAGAGGCATTGACAACAGGGTTCGTAGA 

AGATTC^^GGGCC^GATCTTGATGCCCAAC^TTGGTTATGGAAGC^CAAAAAAAC 
AAAGCACATGCTGCCCAGTGGCTTCCGGAAGTTCCTGGTCCACAACGTCAAGGAGC 
TGGAAGTGCTGCTGATGTGCAACAAATCTTACTGTGCCGAGATCGCTCACAATGTT 
TCCTCCAAGAACCGCAAAGCCATCGTGGAAAGAGCTGCCCAACTGGCCATCAGAGT 
3 5 CACCAACCCCAATGCCAGGCTGCGCAGTGAAGAAAATGAGTAGGCAGCTCATGTGC 
ACGTTTTCTGTTTAAATAAATGTAAAAACTG 



WO 2004/046382 



PCT/GB2003/005102 



- 244 - 

Sequence ID - 1185 nt . 62Q 

CTTTGATTACCTTTGAGTATTAGGTTGAAAGCTTCTCTGTGCTTGATTGAACATTG 

TGATGATGTTGATTGGGTCATGTCAGATTTAGACAGTGTTGTGTTTAAGATAAATG 

TTTAATGGCTCTTAGCAGTGTTCATGCCTCCCCTTTTCCCCTGATACTTTAAAAAC 

AGAATATACAGAAAAGGGGAGTTGGGTGAAGAATCACCATATTCTCATTACCAGAG 

TAGTGTCTACCAGCTGTTTTCACATTTTTCTGTTTCCTTCTGTCCTTGGAATCCTT 

TTTTTAGATCCTTGTAATACTAGTAAAGATATTCCACTCTGTGTTGTAAGCATTTT 

TCCATTTTGCTCCATGGTCTTCATAATGCCCTGTGGTCCTTTATTAAGGGGATGCA 

CCATGTAGAGGTGAAAGGCTTTCCTTGACTTGGCCACCATTTCTGTATTTTCCTTA 

GAGGAGGAGGTTJ-CCAACATTTCTTTTTTAGAGACAGAGTCTCGTTCTGACACGCA 

<3GCAGGAGTGCAGTGGCATGATAACAGCTCACTGCAGCCTCGAACTCCTGGGCTCA 

AGTTATCCTCCCACCTCAGCTTCCTGAGTAGCTAGGACTGCAGGTGCCTGCCACCA 
CACCCAGCTAAT 

Sequence ID - 1186 nt . 494 

CAGCCCTCCGTCACCTCTTCACCGCACCCTCGGACTGCCCCAAGGCCCCCGCCGCC 

GCCTCCAGCGCCGCGCAGCCACCGCCGCCGCCGCCGCCTCTCCTTAGTCGCCGCCA 

TGACGACCGCGTCCACCTCGCAGGTGCGCCAGAACTACCACCAGGACTCAGAGGCC 

GCCATCAACCGCCAGATCAACCTGGAGCTCTACGCCTCCTACGTTTACCTGTCCAT 

GTCTTACTACTTTGACCGCGATGATGTGGCTTTGAAGAACTTTGCCAAATACTTTC 

TTCACCAATCTCATGAGGAGAGGGGAACATGCTGAGAAACTGATGAAGCTGCAGAA 

CCAACGAGGGTGGCCGAATCTTCCTTCAGGATATCAAGAAACCAGACTGTGATGAC 

TGGGAGAGCGGGCTGAATGCAATGGAGTGTGCATTACATTTGGAAAAAAATGTGAA 

^CAGTCACTACTGGAACTGCACAAACTGGCCACTGACAAAAATGAC 

Sequence ID - 1188 nt . 599 

GGGAGACAAGCCCAGCCTTTCGGCGAGNATACGTCTAACCCTGTGCAACAGCCACT 
ACATTACTTCAAACTGAGATCCTTCCTTTTGAGGGAGCAAGTCCTTCCCTTTCATT 
TTTTCCAGTCTTCCTCCCTGTGTATTCATTCTGATGATTATTATTTTAGTGGGGGC 
GGGGTGGGAAAGATTACTTTTTCTTTATGTGTTTGACGGGAAACAAAACTAGGTAA 
AATCTACAGTACACCACAAGGGTCACAATACTGTTGTGCGCACATCGCGGTAGGGC 
GTGGAAAGGGGCAGGCCANAGCTACCCGCAGAGTTCTCAGAATCATGCTGAGAGAG 
CTGGAGGCACCCATGCCATCTCAACCTCTTCCCCGCCCGTTTTACAAAGGGGGAGG 
CTAAAGCCCAGAGACAGCTTGATCAAAGGCACACAGCAAGTCAGGGTTGGAGCAGT 
AGCTGGAGGGACCTTGTCTCCCAGCTCAGGGCTCTTTCCTCCACACCATTCAGGTC 
TTTCTTTCCGAGGCCCCTGTCTCAGGGTGAGGTGCTTGAGTCTCCAACGGCAAGGG 
AACAAGTACTTCTTGATACCTGGGATACTGTGCCCAGAG 
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Sequence ID 1189 

GGGAGACAAGCCCAGCCTTTCGGCGAGATACGTCTAACCCTGTGCAACAGCCACTA 
CATTACTTCAAACTGAGATCCTTCCTTTTGAGGGAGCAAGTCCTTCCCTTTCATTT 
TTTCCAGTCTTCCTCCCTGTGTATTCATTCTCATGATTATTATTTTAGTGGGGGCG 
5 GGGTGGGAAAGATTACTTTTTCTTTATGTGTTTGACGGGAAACAAAACTAGGTAAA 
ATCTACAGTACACCACAAGGGTCACAATACTGTTGTGCGCACATCGCGGTAGGGCG 
TGGAAAGGGGCAGGCCAGAGCTACCCGCAGAGTTCTCAGAATCATGCTGAGAGAGC 
TGGAGGCACCCATGCCATCTCAACCTCTTCCCCGCCCGTTTTACAAAGGGGGAGGC 
TAAAGCCCAGAGACAGCTTGATCAAAGGCACACAGCAAGTCAGGGTTGGAGCAGTA 
10 GCTGGAGGGACCtTGTCTCCCAGCTCAGGGCTCTTTCCTCCACACCATTCAGGTCT 
TTCTTTCCGAGGCCCCTGTCTCAGGGTGAGGTGCTTGAGTCTCCAACGGCAAGGGA 
ACAAGTACTTCTTGATACCTGGGATACTGTGCCCAGAGCCTCGAGGAGGT 

Sequence ID 1190 

GTTTAAATTTGACAAACTAAAGCTNATNACTGCTATAAGAGTAATAACTGCTCATT 
TTCCATAACTCATTCTTAAAGTTTTAGTAATGTAAAAGTTATTTTTTTGCAGTAAG 
TTATAATGATAGAAGCTTACATGTTTTTTCATGCCTCATCTGTTTCCCCTTAAAAC 
TATAATTATCAGTAAAGTCCTGTGGTATTTTTCAATTTGTAAGAAACTAGGCTATA 
TATACATTGGGAAAAACAGCCTTCATTTGTCAATGCACTAGTGTTCCAAAGGTTTC 
TGGTAATTGTGTGCTATTGCTTTTTGTTGACTTGCAAAAAAAAAAAAAAAAAAATT 
ACTATGACTTGNGGTAGCCCTGCAACCTTCGGAAGTGCTTAGCCCAGTCTGACCAT 
ACATTTATATTTANAATGCTTAGGTAAATAAATAATATGCCTAAACCCAATGCTAT 
AAGATACTATATAATATCTCATAATTTTAAAAATCACTGTTTTGTATAATAATAAA 
ACAAGGCAGGCAAGCTGTTCTACAATGACTGTTGGTAAGGGTGCTGAGGAAGAAAA 
ACAAACAATCTTGATTCAGGGATAGTGAATAGACAAAAAATGTCCTAATCAATGAA 
GCTGTGTGATGATTCTGATTGACAGAGA 

Sequence ID 1191 

GTGCAAAGTGTTATATCCACTTTCAACAAAGAGAGAAGCTGAAAAGCTAACCCAAT 
3 0 GTTAATTTTGGATCACACACATTCAGTGTAGACTTTAAGATTTTACTTCTGTTGGA 
GTAGCTATATTATTTCTAGTTAAAAAACTCTCTATATACATATTTATTTGTTTTTC 
TACTTGTTTAATATTTTTCTCTTCCAATTAGGAACTCAATATGGAATAAAAAATAT 
TTAAATGTATTTTACTCAAACGTGTGTGTATATATGTTTGTGTGCATGATAAGGAG 
AGTGAGAGCAAGAGTAAGAGAGAGAGAGCACGCATAGATGGAAGCACACATTTAAT 
3 5 GTCTATGAAATGAGAAAACATTAAGGCTAAGATATTTTTCCTTCTGAACTAGCAGA 
TTGTATCAATGGCTGGTC^CTTAAATTAATCAGTTTGTAAAGATATTTAAAAGGTA 
TGTCTACCTTCTTGCAATTAATTTGATTATGTTCTAATGGCATGGCAAGAGAAATG 
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AAAGAAGATAACTAAAAGTTAAAAGTCGTTGC^TGTTTTTGTTGCAGCATACCCTT 

CTTTCAGGCTACCGAATAACCTTGATTGACATTGGATTAGTAGTAGAATACCTCAT 
TGGTAGAGCATATCGCAGCANCTACACTAGAAAACAT 

♦ 

Sequence ID 1192 

GTCTGGAACTCCAGACCTCAGGTGATACCCCTGCCTCAGCCTCCCAATGTGCTGGG 

ATTACAGCTGTGAAGCCACCGCGCCCGGCTGCTGTGATAGTTGAGATGTAAACCAA 

AAATAAAATTCTAAGCCACCCAATCCGACTGAATGGACCCTTCCTGTTGAGCAAGG 

ACATTCCAAAGTAAACTGAAAAGACCAGCTTAGGCCATGATGGGAAGGGGAGGTGT 

CAACATGCCTCATTCTACCTTCCTCCCTCTGGAATCCAGACACAACTGACCAGCAT 

TAACATTAAAACAGAGATCTTAAGCTGGGCACGGTGGCTCATGCCTGTAATCCCAG 

CACTTTGGGAGGCCAAGGTGGGATCACCTGAGGTCGGAAGTTCAAGACCAGCCTGG 

CCGGTATGGTGAAGCCATGTCTCTACTGAAAATGCAAAATTGGCCGGACATTGTGG 
TGCA 

Sequence ID 1193 

TNCNTTTTTTTTCCCNCGGGAAAGCGCGCCATTGTGTTGGTCCCCGGGAATTCGCG 
GCCGCGTCGACGAGAAATGGCTTGAACCCAGTAGGCAGAGGTTGTAGTGAGCCCAG 
AATNGGNCACCTGCAC^TTTANCCNTGGGTGACAAAANTGAAAACTTTGTCTNAAA 
AAAAAAAAAAAAAAATTTTAANTNAAATNAAAAANCCTTTNCNTTNTTTTTNAAAN 
NGGGGGGGGl^TTTTTNGGGNTTNGNNNTGGTAAAAANTNNNTTTTTTTTTTTTTA 
GGGGCCNANNCCCCNTTTTANAAAANCCNGNTTTTNAAAAAANTTTTTTNCCCNCN 
NTTNGGGGGGGGGGNTTTTNANCNNTNTTNGGGGGGGNNCCCCTNTTANNACCNNC 
AAANTTTTTAl^TTTTTGNNNAANNN CCCCCTTTTTTNNTTTTTTTTGNGGGGGGG 

GGGNNGCCCCCNNCCTTTNGGGGGGGGGGm^GNAAAANNACTTTTNAAAANNA 
AGGGNNGGGGGNANATNNCCCCCCCNGGNTTTTTTTTTTAAAAANTNAANNGGGGG 
GGGNNNCTNANTNGGGGCNCCCANNGGGGGNTTANAANNATTTTCTNCCCAAACCC 
CCNGNTTTTATNNCCCCCCCCCCCCNCNNNNGAANGGGNGGNCCNTTTTTTTTATT 
TTTITOGGNGGGNAAAAAANTTTNAAAAANNANNATNTTTTTTCCCCCCCCCCCCNC 
TTTTNGGNAAANCC3^GGGGGGNTCCTTTTTNAAANN1WCCCCCAAAAAAAANTTT 
TTTTNTTNTNTTTTTCTCTNGGGGN CCNNANTTNTANANTTTTNCNCCNAAAAAAA 

ANGGGNCCCCTTTTTTTNC^GGNNGGIWCCCAAAANNTTTTTTTTNAAAAAAAAAA 
AAAA 

35 Sequence ID 1195 

GTTCGTGACNTTCGGAGCTACCTGACAGAGCAGAGTCAACCAGGNTCTGCCCAAAG 
AGAGTGTTAGGCCTGAGCTTGAGAGCCCTGGAGAGACGTGTGCACAAAATGTGACC 
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TGAGGCCCTAGTCTAGCAAGAGGACATAGCACCCTCATCTGGGAATAGGGAAGGCA 
CCTTGCAGAAAATATGAGCAATTTGATATTAACTAACATCTTCAATGTGCCATAGA 
CCTTCCCACAAAGACTGTCCAATAATAAGAGATGCTTATCTATTTTA 

Sequence ID - 1196 nt . 412 

GTCGACGCGGCCGCGGTCGCTGGAGNCGATCAACTCTAGGCTCCAACTCGTTATGA 

AAAGTGGGAAGTACGTCCTGGGGTACAAGCAGACTCTGAAGATGATCAGACAAGGC 

AAAGCGAAATTGGTCATTCTCGCTAACAACTGCCCAGCTTTGAGGAAATCTGAAAT 

AGAGTACTATGCTATGTTGGCTAAAACTGGTGTCCATCACTACAGTGGCAATAATA 

TTGAACTGGGCACAGCATGCGGAAAATACTACAGAGTGTGCACACTGGCTATCATT 

GATCCAGGTGACTCTGACATCATTAGAAGCATGCCAGAACAGACTGGTGAAAAGTA 

AACCTTTTCACCTACAAAATTTCACCTGCAAACCTTAAACCTGCAAAATTTTCCTT 
TAATAAAATTTGCTTGTTTT 

Sequence ID 1197 

CCGCCAACATGGGCCGCGTTCGCACCAAAACCGTGAAGAAGGCGGCCCGGGTCATC 

ATAGAAAAGTACTACACGCGCCTGGGCAACGACTTCCACACGAACAAGCGCGTGTG 

CGAGGAGATCGCCATTATCCCCAGCAAAAAGCTCCGCAACAAGATAGCAGGTTATG 

TCACGCATCTGATGAAGCGAATTCAGAGAGGCCCAGTAAGAGGTATCTCCATCAAG 

CTGCAGGAGGAGGAGAGAGAAAGGAGAGACAATTATGTTCCTGAGGTCTCAGCCTT 

GGATCAGGAGATTATTGAAGTAGATCCTGACACTAAGGAAATGCTGAAGCTTTTGG 

ACTTCGGCAGTCTGTCCAACCTTCAGGTCACTCAGCCTACAGTTGGGATGAATTTC 

AAAACGCCTCGGGGACCTGTTTGAATTTTTTCTGTAGTGCTGTATTATTTTCAATA 
AATCTGGGACAA 

Sequence ID 1198 

CAGAGGTGGGAGGATTGCTTCAGTTCAAGAGTTTGAGACCAGCCTGGGTAACATGG 

CGAAACCCTGTCTTTACAAAAAATGCAAACCTTTGCCGCATGTGTTGGGGTGCGCC 

TGTAGTCCCAGCTTCTCGGGAGGCTGAGGTGGGGGGACCACCTGAGCCATGGAGGT 

TGAGGCTGCAGTGAGCCGTGATACCACCACTGTACTCTAGCCTGGGCCATAGAGTG 
AGACACCCTGCCTCAGAAATA 

Sequence ID - 1199 nt : 439 

CCCATCCCCTCGACCGCTCGCGTCGCATTTGGCCGCCTCCCTACCGCTCCAAGCCC 

AGCCCTCAGCCATGGCATGCCCCCTGGATCAGGCCATTGGCCTCCTCGTGGCCATC 

TTCCACAAGTACTCCGGCAGGGAGGGTGACAAGCACACCCTGAGCAAGAAGGAGCT 

GAAGGAGCTGATCCAGAAGGAGCTCACCATTGGCTCGAAGCTGCAGGATGCTGAAA 
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TTGCAAGGCTGATGGAAGACTTGGACCGGAACAAGGACCAGGAGGTGAACTTCCAG 
GAGTATGTCACCTTCCTGGGGGCCTTGGCTTTGATCTACAATGAAGCCCTCAAGGG 
CTGAAAATAAATAGGGAAGATGGAGACACCCTCTGGGGGTCCTCTCTGAGTCAAAT 
CCAGTGGTGGGTAATTGTACAATAAATTTTTTTTTGGTCAAA.TTTAA 

Sequence ID - 1200 n t : 526 

CTGGAGACGACGTGCAGAAATGGCACCTCGAAAGGGGAAGGAAAAGAAGGAAGAAC 

AGGTCATCAGCCTCGGACCTCAGGTGGCTGAAGGAGAGAATGTATTTGGTGTCTGC 

CATATCTTTGCATCCTTCAATGACACTTTTGTCCATGTCACTGATCTTTCTGGCAA 

GGAAACCATCTGCGGTGTGACTGGTGGGATGAAGGTAAAGGCAGACCGAGATGAAT 

CCTCACCATATGCTGCTATGTTGGCTGCCCAGGATGTGGCCCAGAGGTGCAAGGAG 

CTGGGTATCACCGCCCTACACATCAAACTCCGGGCCACAGGAGGAAATAGGACCAA 

GACCCCTGGACCTGGGGCCCAGTCGGCCCTCANAGCCCTTGCCCGCTCGGGTATGA 

AGATCGGGCGGATTGAGGATGTCACCCCCATCCCCTCTGACAGCACTCGCAGGAAG 

GGGGGTCGCCGTGGTCGCCGTCTGTGAACAAGATTCCTCAAAATATTTTCTGTTAA 
TAAATTGCCTTCATGTAAACTG 

Sequence ID - 1201 n t : 613 

CTTAAGTATGCCCTGACAGGAGNATGAAGTAAAGAAGATTTGCATGCAGCGGTTCA 

TTAAAATCGATGGCAAGGTCCGAACTGATATAACCTACCCTGCTGGATTCATGGAT 

GTCATCAGCATTGACAAGACGGGAGAGAATTTCCGTCTGATCTATGACACCAAGGG 

TCGCTTTGCTGTACATCGTATTACACCTGAGGAGGCCAAGTACAAGTTGTGCAAAG 

TGAGAAAGATCTTTGTGGGCACAAAAGGAATCCCTCATCTGGTGACTCATGATGCC 

CGCACCATCCGCTACCCCGATCCCCTCATCAAGGTGAATGATACCATTCAGATTGA 

TTTAGAGACTGGCAAGATTACTGATTTCATCAAGTTCGACACTGGTAACCTGTGTA 

TGGTGACTGGAGGTGCTAACCTAGGAAGAATTGGTGTGATCACCAACAGAGAGAGG 

CACCCTGGATCTTTTGACGTGGTTCACGTGAAAGATGCCAATGGCAACAGCTTTGC 

CACTCGACTTTCCAACATTTTTGTTATTGGCAAGGGCAACAAACCATGGATTTCTC 

TTCCCCGAGGAAAGGGTATCCGCCTCACCATTGCTGAAGAGAGAGACAAAAGA 

Sequence ID 1202 

GGAATTCGCGGCCGCGTCGACCTCTGCTCGAATTGACAGAAAAGGATTCTGTGAAG 
AGTGATGAGATTTCCATCCATGCTGACTTTGAGAATACATGTTCCCGAATTGTGGT 
CCCCAAAGCTGCCATTGTGGCCCGCCACACTTACCTTGCCAATGGCCAGACCAAGG 
TGCTGACTCAGAAGTTGTCATCAGTCAGAGGCAATCATATTATCTCAGGGACATGC 
GCATCATGGCGTGGCAAGAGCCTTCGGGTTCAGAAGATCAGGCCTTCTATCCTGGG 
CTGCAACATCCTTCGAGTTGAATATTCCTTACTGATCTATGTTAGCGTTCCTGGAT 



WO 2004/046382 



PCT/GB2003/005102 



- 249 - 

CCAAGAAGGTCATCCTTGACCTGCCCCTGGTAATTGGCAGCAGATCAGGTCTAAGC 
AGCAGAACATCCAGCATGGCCAGCCGAACCAGCTCTGAGATGAGTTGGGTAGATCT 
GAACATCCCTGATACCCCAGAAGCTCCTCCCTGCTATATGGATGTCATTCCTGAAG 
ATCACCGATTGGAGAGCCCAACCACTCCTCTGCTAGATGACATGGATGGCTCTCAA 
5 GACAGCCCTATCTTTATGTATGCCCCTGAGTTCAAGTTCATGCCACCACCGACTTA 
TACTGAGGTGGATCCCTGCATCCTCAACAACAATGTGCAGTGAGCAT 

Sequence ID - 1203 nt : 692 

TGCAGAGGGGTCCATACGGCGTTGTTCTGGATTCCCGTCGTAACTTAAAGGGAAAC 

1 0 TTTCACAATGTGCGGAGCCCTTGATGTCCTGCAAATGAAGGAGGAGGATGTCCTTA 
AGTTCCTTGCAGCAbGAACCCACTTAGGTGGCACCAATCTTGACTTCCAGATGGAA 
CAGTACATCTATAAAA.GGAAAAGTGATGGCATCTATATCATAAATCTCAAGAGGAC 
CTGGGAGAAGCTTCTGCTGGCAGCTCGTGCAATTGTTGCCATTGAAAACCCTGCTG 
ATGTCAGTGTTATATCCTCCAGGAATACTGGCCAGAGGGCTGTGCTGAAGTTTGCT 

15 GCTGCCACTGGAGCCACTCCAATTGCTGGCCGCTTCACTCCTGGAACCTTCACTAA 
CCAGATCCAGGCAGCCTTCCGGGAGCCACGGCTTCTTGTGGTTACTGACCCCAGGG 
CTGACCACCAGCCTCTCACGGAGGCATCTTATGTTAACCTACCTACCATTGCGCTG 
TGTAACACAGATTCTCCTCTGCGCTATGTGGACATTGCCATCCCATGCAACAACAA 
GGGAGCTCACTCAGTGGGTTTAATGTGGTGGATGCTGGCTCGGGAAGTTCTGCGCA 

2 0 TGCGTGGCACCATTTCCCGTGAACACCCATGGGAGGTCATGCCTGATCTGTACTTC 
TACAGAGATCCTGAAGAGAT 

Sequence ID 1204 

TTTTTTTTTTTTTCCTGCGGGAAAGCGCGCCATTGTGTTGGTACCCGGGAAATTCG 

2 5 CGGCCGCGTCGACACAGGCCCCAGCATCAAGATCTGGGATTTAGAGAGGAAAGATC 

ATTGTAGATGAACTGAAGCAAGAAGTTATCAGTACCAGCAGCAAGGCAGAACCACC 
CCAGTGCACCTCCCTGGCCTGGTCTGCTGATGACACAGGTTGGGCNGGNNCNCNGG 
GGNGGlSnsnsnSTGNNNNGOSTGl^ 

Gisnsrcmnsnsn^is^^ 

3 0 IWTNNNNGGGTNCNNNCNClSrNNGGCGCGC 

Sequence ID 1205 

CAGACTCTGACCCAGCCTCAGTCCTAACTCCTGGGGCTGGGCTGAGGGGAACAAGC 
ATTTGCTGAAACTTGAAAAAACAAAGCAAATCAAAAACAGGAAAAAATTGTACCTG 
3 5 GTACTTTTTTTTAGAAAAAAAGATTAAAAAAGAAAGAATAAATTCTTGTTTGGAAA 
CTTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAATTTTAAACTC 
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TNNNIWTNNCNNCNANTAANNC^ 
ACN 



Sequence ID - 1207 nt . 642 

ACGAGAAGCCAGATACTAAAGAGAAGAANCCCGAAGCCAAGAAGGTTGATGCTGGT 

GGCAAGGTGAAAAAGGGTAACCTCAAAGCTAAAAAGCCCAAGAAGGGGAAGCCCCA 

TTGCAGCCGCAACCGTGTCCTTGTCAGAGGAATTGGCAGGTATTCCCGATCTGCCA 

TGTATTCCANAAAGGCCATGTACAAGAGGAAGTACTCAGCCGCTAAATCCAAGGTT 

GAAAAGAAAAAGAAGGAGAAGGTTCTCGCAACTGTTACAAAACCAGTTGGTGGTGA 

CAAGAACGGCGG,TACCCGGGTGGTTAAACTTCGCAAAATGCCTAGATATTATCCTA 

CTGAAGATGTGCC^tGAAAGCTGTTGAGCCACGGCAAAAAACCCTTCAGTCAGCAC 

GTGAGAAAACTGCGAGCCAGCATTACCCCCGGGACCATTCTGATCATCCTCACTGG 

ACGCCACAGGGGCAAGAGGGTGGTTTTCCTGAAGCAGCTGGCTAGTGGCTTATTAC 

TTGTGACTGGACCTCTGGTCCTCAATCGAGTTCCTCTACGAAGAACACACCAGAAA 

TTTGTCATTGCCACTTCAACCAAAATCGATATCAGCAATGTAAAAATCCCAAAACA 
TCTTACTGATGCTTACTTCAAAAAGA 

Sequence ID 1208 

CCCTATACCTTCTGCATAATGAATTANCTAGAAATAACTTTGCAAGGGAGAGCCAA 
AGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAGCTAAAAGAGCACACCC 
GTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGGCGACAAACCTACCGAG 
CCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTTAAATTTGCCCA 
CAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCAAAGAGGAACAGCTC 
TTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTAACACCCATAGTA 
GGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCACTACCTAA 
AAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATCACCC 
TATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAG 

Sequence ID - 1209 n t : 620 

CTCTCCTGTCAACAGCGGCCAGCCTCCCAACTACGAGAATGCTCAAGGAGGAGCAG 
GAAGTGGCTATGCTGGGGGCGCCCCACAACCCTGCTCCCCCGACGTCCACCGTGAT 
CCACATCCGCAGCGAGACCTCCGTGCCCGACCATGTCGTCTGGTCCCTGTTCAACA 
CCCTCTTCATGAACACCTGCTGCCTGGGCTTCATAGCATTCGCCTACTCCGTGAAG 
TCTAGGGACAGGAAGATGGTTGGCGACGTGACCGGGGCCCAGGCCTATGCCTCCAC 
CGCCAAGTGCCTGAACATCTGGGCCCTGATTTTGGGCATCTTCATGACCATTCTGC 
TCGTCATCATCCCAGTGTTGGTCGTCCAGGCCCAGCGATAGATCAGGAGGCATCAT 
TGAGGCCAGGAGCTCTGCCCGTGACCTGTATCCCACGTACTCTATCTTCCATTCCT 
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CGCCCTGCCCCCAGAGGCCAGGAGCTCTGCCCTTGACCTGTATTCCACTTACTCCA 
CCTTCCATTCCTCGCCCTGTCCCCACAGCCGAGTCCTGCATCAGCCCTTTATCCTC 
ACACGCTTTTCTACAATGGCATTCAATAAAGTGTATATGTTTCTGGTGCTGCTGTG 
ACTT 

5 

Sequence ID 1210 

TTCGTAATTAGAATACTGTTTGGACTTGCTCAACAAGCACCTTATCTTAACAAAAA 
GTAACTTATAGAAAAGGGAGACATTCATTTAACTTCAAGCCCATATTATTCTTAAA 
AGCTGACTCTTGAAATAGTATTTATTGAGTCATAGTGGAGTCATGGGACTTTTTAA 

1 0 GGGCCGGAAGGGACTATTTAGATCATCCAGTCCCACCCTGTCATTTTATGGAGGAG 
GAAACTGAGGCCTAGATAAGATAACCAGTTAGTGGGTCCACTGACCTTTAGGACAG 
TAGTCTATCCGTAAGAGACAACATGGAGAAAGAAATACAACGTTTTTATAGTGAAT 
TATCATCTTACAAAGAATATTCTTCCCATATCGCACTTTTAAAAAGTGGGTACCTT 
AGTCAAATAGGAGAAAAAACCACTTGAGTAGTTTCATCCTCAGGTTTTAGGTGAGG 

1 5 AAACTGATACTCAGATTAAATAACTTTAAGCACACAGAGCCTGAATGATAGTCTTA 
TTTGAGCTCATCTGTGCTTTTAATGTGTACTACGTTAGGTGTTTTCACTTGCATTT 
CCTTTAGTCTTATTTGAGCTCATCTGTGCTTTTAATGTGTACTACGTTAGGTGTTT 
TCACTTGCATTTCCTTGTTTGACGTTGACAATAAATCGTGAAGCTGCCTTATCTAA 
GGAAGTCCTAAAGTAAATCATTGGAACACA 

20 

Sequence ID 1211 

CCATTGTGTTGGNACCCGGGAATTCGCGGCCGCGTCGACGGAGTTTTACCTTATTA 
CACTTTAATCTCTGGATTTACCCCATCTCATTTCTCTTTTAGGAAAACTGTTTGTA 
TGTGGTGGCTTTGATGGTTCTCATGCCATCAGTTGTGTGGAAATGTATGATCCAAC 

2 5 TAGAAATGAATGGAAGATGATGGGAAATATGACTTCACCAAGGAGCAATGCTGGGA 

TTGCAACTGTAGGGAACACCATTTATGCAGTGGGAGGATTCGATGGCAATGAATTT 
CTGAATACGGTGGAAGTCTATAACCTTGAGTCAAATGAATGGAGCCCCTATACAAA 
GATTTTCCAGTTTTAAGAAATTTAAGACCCTCTCAAACTAACAGGCTTAGTGATGT 
AATTATGGTTAGCAGAGGTACACTTGTGAATAAAGAGGGTGGGTGGGTATAGATGT 

3 0 TGCTAACAGCAACACAAAGCTTTOGCATAT 

ACTTTTTGGGTTTATTTGGAAAGGAATGCAAAGATGAAGGTCTGTTTTGTGTACTT 
TTAAGACTTTGGTTATTTTACTTTTTGGAAAAGAATAAACCAAGAATTGATTGGGC 
ACATCATTTCAAGAAG 

35 

Sequence ID - 1212 nt: 374 

AGAGCAGCAGCCATGGCCCTACGCTACCCTATGGCCGTGGGCCTCAACAAGGGCCA 
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CAAAGTGACCAAGAACGTGAGCAAGCCCAGGCACAGCCGACGCCGCGGGCGTCTGA 

CCAAACACACCAAGTTCGTGCGGGACATGATTCGGGAGGTGTGTGGCTTTGCCCCG 

TACGAGCGGCGCGCCATGGAGTTACTGAAGGTCTCCAAGGACAAACGGGCCCTCAA 

ATTTATCAAGAAAAGGGTGGGGACGCACATCCGCGCCAAGAGGAAGCGGGAGGAGC 

TGAGCAACGTACTGGCCGCCATGAGGAAAGCTGCTGCCAAGAAAGACTGAGCCCCT 
CCCCTGCCCTCTCCCTGAAATAAAGAACAGCTTGACAG 

Sequence ID - 1213 nt : 567 

GAATTATTGACTTTGAATTGCATTTCAGTACCATGAAGTCAAAGTCAGTGGTGTAT 

TTGCTCATTTGTTCATTCTTTCTTTTCCACCAACATTACTGCCTGCAGAGCCAGAG 

GTGAGTGCAGAAATCCTGTCAATTCGTCACTTGTGGACAACCTGCAGCTTGCCACA 

GCCTACAGTTCCACCACTGTGACCTCTGAAAACCTCCTGAACAAAAGGAAGGAGAC 

TTGGAAATCCTGAATGGGCTTGGAGACATTAAGGGAGAACTGCCTCCCTGGACCAA 

GGCAGAATTCAATAGAACCAGCAAGAAATTTTCCTATGA^TGGGAAAGCAGGTGGC 

AGGGGGCAGGGGTGGAAAAGCTTTGTACAGGAATTGTGGAAAAGCTTTTGCATTAT 

CTCTAGTCTGAAAGTCACATTTCTCAGTTCCTTTCCACTCTCTTCTGTCAACTTGC 

TGTGAGTAAATGACATCTGTCACCTGTGACACGGGCCAGGGACTATCACCATATGG 

CCCCCACACATTATCTAGTACCAGCCTGCCTGGGCCATGCCTTTTCCAGTCACTGT 
ACCAGCC 

Sequence ID - 1214 nt: 620 

CTCTCCTGTCAACAGCGGCCAGCCTCCCAACTACGAGAATGCTCAAGGAGGAGCAG 

GAAGTGGCTATGCTGGGGGCGCCCCACAACCCTGCTCCCCCGACGTCCACCGTGAT 

CCACATCCGCAGCGAGACCTCCGTGCCCGACCATGTCGTCTGGTCCCTGTTCAACA 

CCCTCTTCATGAACACCTGCTGCCTGGGCTTCATAGCATTCGCCTACTCCGTGAAG 

TCTAGGGACAGGAAGATGGTTGGCGACGTGACCGGGGCCCAGGCCTATGCCTCCAC 

CGCCAAGTGCCTGAACATCTGGGCCCTGATTTTGGGCATCTTCATGACCATTCTGC 

TCGTCATCATCCCAGTGTTGGTCGTCCAGGCCCAGCGATAGATCAGGAGGCATCAT 

TGAGGCCAGGAGCTCTGCCCGTGACCTGTATCCCACGTACTCTATCTTCCATTCCT 

CGCCCTGCCCCCAGAGGCCAGGAGCTCTGCCCTTGACCfGTATTCCACTTACTCCA 

CCTTCCATTCCTCGCCCTGTCCCCACAGCCGAGTCCTGCATCAGCCCTTTATCCTC 

ACACGCTTTTCTACAATGGCATTCAATAAAGTGTATATGTTTCTGGTGCTGCTGTG 

ACTT 

Sequence ID 1215 

CACAAGATAGAATGGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATTTTAA 
GTGACAGTGCCATAGTTTGGACAGTACCTTTCAATGATTAATTTTAATAGCCTGTG 
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AGTCCAAGTAAATGATCACTTTATTTGCTAGGGAGGGAAGTCCTAGGGTGGTTTCA 
GTTTCTCCCAGACATACCTAAATTTTTACATCAATCCTTTTAAAGAAAATCTGTAT 
TTCAAAGAATCTTTCTCTGCAGTATkATCTCGCAGGGGAATTTGCACTATTACACTT 
GAAAGTTGTTATTGTTAACCTTTTCGGCAGCTTTTAATAGGAAAGTTAAACGTTTT 
5 AAACATGGTAGTACTGGAAATTTTACAAGACTTTTACCTAGCACTTAAATATGTAT 
AAATGTACATAAAGACAAACTAGTAAGCATGACCTGGGGAAATGGTCAGACCTTGT 
ATTGTGTTTTTGGCCTTGAAAGTAGCAAGTGACCAGAATCTGCCATGGCAACAGGC 
TTTAAAAAAGACCCTTAAAAAGACACTGTCTCAACTGTGGTGTTAGCACCAGCCAG 
CTCTCTGTACATTTGCTAGCTTGTAGTTTTCTAAGACTGAGTAAACTTCTTATTTT 
1 0 TAGAAAGTGGAGGTCTGGTTTGTAACTTTCCTTGTACTTAATTGGGTAAAAGT 

Sequence ID - 1216 nt : 484 

CAACCTTAGCCAAACCATTTACCCAAATAAAGTATAGGCGATAGAAATTGAAACCT 

GGCGCAATAGATATAGTACCGCAAGGGAAAGATGAAAAATTATAACCAAGCATAAT 

1 5 ATAGCAAGGACTAACCCCTATACCTTCTGCATAATGAATTAACTAGAAATAACTTT 
GCAAGGAGAGCCAAAGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAGCT 
AAAAGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGGCG 
ACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAAC 
TTTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCA 

20 AAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATT 
TAACACCCATAGTAGGCCTAAAAGCAGCCACCAATT 

Sequence ID 1217 

GACAGGCGGGGGCCCAGCGGCCGGGTGAAGGCCGGGTGGCTCTGTGAATCAT^AGGA 

2 5 GAGTCCCAGAAAACCTGTGACTGTTGAAGAAAATTGATCTGTGAATTTTTATATTC 

AAGGAGTCAGTATTTATATTCATCTTTTAAACTGGGAAGATTTATATTTTACTTTA 
AT^CTTCTTGATAATAATTTACAATGAATGGACACAGTGATGAAGAAAGTGTTAGA 
AACAGTAGTGGAGAATCAAGGTAAGTAAGCACTTTGTTATCAATTGTTTACTATGA 
AGAGAGTTGAAAACTTGACTTTTTTCTTTATTGTTATTGTTGTTATTTAGTTTTCC 

3 0 TCATAGGTAGCAGAGTTTTCAGGTTTTCCTCTTAGCTATCCAAATACTAAAAAAAT 

TCTGATATACGAACCTTTTTTCATAATACAGGTTTTAATTATATTTTTCATTCAGA 
TACACAGTAGATCTTAAATATAGAAAGTTTTTGTTTACTTAAATCTATTTGGAAGT 
TTATATTTGAGCTAATAATTAAGCTGGAGCATGTATAATAGATTTAAATTGTTTTG 

ACTGTTAGTGAAATTT 

35 

Sequence ID 1218 

CTCACTTGGTGGGTGAGCCTCCAATGACTACACCCAAGGAGGATTTAACACAGGGA 
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TTTTATGACTTGCAACAAGTCAGGAGGACATGGGGTTGGGGTAGTTCAGCAGTGCC 
TGTCTGAACAAAGGTGAAAATTGGGCTTTTATTGGGCTGATCAAGGGGGAGTAAAG 
GCAGCCAGGAGCAGTCGCCTGTCATGCTTCTACCTATATTGCATGTATAGAAAAGG 
GAAAATAAACTCCTTCCTGGGCAGGGTTTTAGTATGCTAAGGAGGGGAGTTATTCA 
ACTTCAATCCAACTCAAGCATCAGCATTGCTGCGTCCATCCCAGTTTTGTTTTGCT 
GGGGCTGAACTTCTTCCTATAACTTTTTGAAACAACAAGAACTCAAGGTGTGACAG 
TTACAAGTGGGCCCTTTTTCACAGTGTGTACCTAAACACGTGAGGACCCTGGATTA 
CAGAATGACAGACTCGAAGTGACTCAAGTTCCGGTTGTTCATCTTTAGATGGTAAA 
GATGGCTGTACGTACTATCCTTGCTTATTTCCAATCTATTGTTTAAACTCTTGTAT 
. ATGTAATACCGQAGAGGCTAGAGATACAACCTTTGACCAAATGAGTGAATTCAAGT 
AATCCATTACTAATGTGATCTGGAAACAAACATGGTGTTGAATGTGCATATGT 

Sequence ID - 1219 nt . 55g 

CTTGGCAGCTCCGTTATGTGCCCAGCTCTTTGCAAGGGCATACTGGGAAATGAGTG 

gagataaaggacccaatcataagcattttacagtatggataccccattttaaaaag 

CAGTGATTCATGAATCAGGCAGCACCAAACCAGAAGGAGGCTTTGCTGAANAAGGA 

tgagggacaagcatttataaagtgaatgtagatgtaatacaaagaaaatatttgaa 

CCGGGTGCGGTGGCTTACACTTGTAATCCCAACACTTTGGGAGGCCAAGGCGGGCA 
GATCACAAGATCAAGAGATCGAGACCATCCTGGTCAACATGGTGAAACCCCATCTN 
TACTAAAAAATACAAAAATTANCTGGGCGTGGTGGTGCGTGCCTGTAGTCCCAGCT 
ACTTGGGCGGCTGAGGCAGGANAATTGCTTGAACCCGGGAGGTGGAGGTTGCAGTA 
AGCCGAGATTGCACCATTGCACTACTCCAGCCTGGTGACAGAGAGAGACTCCATC 

Sequence ID 1220 

GANmGTGCGATANNATGNNTGTCTTTTTTTTAAAGTNTTTCNNATNGNAGNGAAN 

CCCCCNNANNTNNCATAANGAGAGATNACTACNGTACANATAGNGNCANACNGATA 

GTAGTANCAANATTGTNTTAGCTANATNANTCAATAGATATCNAGATANAANAANA 

NCNNGGATATACAGCGATGTNTNANNGGNNNNNNNANGGAACGAACATCNAC 

ANNATAAGCTNGNGGAGAGAGACANGTANGTTATANANNAGAATNGNAGTAGGNGT 

GATCATAATAGISnSTNimANOTANTATATANGATNT^ 

CNNNAA.TNTCTATNCTNGAGAGNAGCOTQNATNNNNAGGCGANGANATTGGGNm 
CTC^TIWATAGANANCTGGTGTCNNANAANTACNTCATCTATTNANCTCTCACNANA 
TGGNANNATANAGNAGNGNNNTNNANAGGANTANGCATAGNGNNTNNCT'NAAACAA 
AANNNATAAGANNTCTCGNNAANANGGGCCTNTNNT^ 

ATANTTNTTCNCTCTT^AATAlTOTANGATANATGANCTNGNNGTGATANATAlSnsm 
NNTACNGTNAANNTNTANTCNTATAATAGATANAAATATAGGATNTTNCTCTGGCN 
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GGTNGAANANTTNNTN CNNTTTNAATAATGNTGTTAGNGACNGNGNTNTNANAJKnsnST 
N1STTTAGAAAGGTACTCTATATACTNNTATGNTNCGGCNNATAATANAACAGATGTT 
TGTATNAATATNAAANAAGGTCNNTTTCGNCAAGAGAANNNTGNCTGGTNATAGAA 
TTAGCATAANTTAI^TANTA^ 
5 NAGTCATTNNGNATNTATNNNGNNTANTAGTNANTTGGGNC 
TNTGNGAANATGAAISTCTC^ 

ANANlSrraTTNTANNANTGNCTATANATTGCCm 

TAGCCCGCNCTAAGGAlSlNTl^GTNANNTAAAlSrNTCT CAGATAANNTACNTNTTNTSTT 
TATTAANC^ANNATCACANTATANCN^ 
1 0 TATNACNGNTCOSfNNC^ 

NCTAGACNATTTNNANTGAAAN^ 

CTAaySTTCCGCMSFTNTTTNTACAGAT^ 

NGCTNNNACT 



15 Sequence ID - 1221 nt: 741 

AAGCAGAANTNTCTCTAAAAACATTATCTCCTTAAAATCTTGAGGTGCATATNAGA 
GCCACAGGCAATCTCTGACATATAAAATTGCAGTACAGGCCTTTCAAATTTGGCAT 
TTCACTGGTACAATACAACAACCAAGATATATAATAACTGTACAGTGCCTAGACAT 
TCCAGTAAGAACCATTATTTTCTTTAATGTAGAATGATTAATACATATTCTACAAG 

2 0 GGGCAGTAAGGTTAGTAATTCTATAGGGTATGTCCCGACATAATTTTCAAATTGTA 
CAATAACACAAACAACTTTGTTAAGGCCATGTTTTATTTGCTGATTAATGGACAAA 
AGGCAATGTAATTTATTTTCAAGTATTTTCTTGAAAGTCTGTGCTCATAAAAATCA 
TGAAAAGTTGGAAAGACTGTTAAATCACTGAAACTTCAAATATATCTTACACAATC 
TTGTTTGTACAAAAATACAAGTTAAATATAAACATAAAGCAATCATGGTAATTTTA 

2 5 TGCAAATCTGTTTTATGTGATCAT 

TTTGTGAAAAGATCAATACCAGATTGAATGACTACCTATTGGCAAAGGGCCCTAAA 
AAGCTTACTTTAGCACTCATCTTTTACATGGTTAAATGCATTTCCTAATTTGAGAT 
CACCTAAACACTGGAAAAGAAAAAAAATGAAAGGGCAGTATGTCCATAAACCAACA 
AATAATTTGGCTG 

30 

Sequence ID - 1224 nt : 485 

CGAAATTTCCTTGTGACACAGAGGAAGGGCAAAGGTCTGAGCCCAGAGTTGACGGA 
GGGAGTATTTCAGGGTTCACTTCAGGGGCTCCCAAAGCGACAAGATCGTTAGGGAG 
AGAGGCCCAGGGTGGGGACTGGGAATTTAAGGAGAGCTGGGAACGGATCCCTTAGG 

3 5 TTCAGGAAGCTTCTGTGCAAGCTGCGAGGATGGCTTGGGCCGAAGGGTTGCTCTGC 

CCGCCGCGCTAGCTGTGAGCTGAGCAAAGCCCTGGGCTCACAGCACCCCAAAAGCC 
TGTGGCTTCAGTCCTGCGTCTGCACCACACAATCAAAAGGATCGTTTTGTTTTGTT 
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TTTAAAGAAAGGTGAGATTGGCTTGGTTCTTCATGAGCACATTTGATATAGCTCTT 

TTTCTGTTTTTCCTTGCTCATTTCGTTTTGGGGAAGAAATCTGTACTGTATTGGGA 
TTGTAAAGAACATCTCTGCACTCAGACAGTTTACAGA 

Sequence ID 122 6 

GGTTTTTATACTTGCCATGAAACTGTTCTTTGGGATATTATTTTGTTCAGGTTCCC 

CACTTGGACAGCAGAGGGGGTGACTCTGCCCATCCCTGCCACTGGTAGCCAGGCGG 

GCAATGTCTGCTAGCAGTCTGCTTCTGTCTGAACTCAGCCAGCAGAGGCAAACTCC 

CGGTTCCCCGAGAAACACTCTGAAGGCAGGGTGGGTGACTCCACCCACCACCGCCT 

CTCCTAGCCATGCAGGCCATGTCTGCTAGAGCTTCCAGCGCAGTGGTCCTAATTCT 

GTCTGAATCCGGCTGAGGGGTGCAGCCTCCTGTTACTGCCCAGGGAAACACCCAGA 

TGGCAGGGTGGGTGACTCCAACCACCTCTGCCTGTGGTAGCCAGATGGGCCACACC 

TGCTAGAGCTTCCAGCCCAGCAGTCCCGCTACTCTGTGGGTGGGTGCCATCCCCTG 

TTCCTCTGGGAAGCACCCAGACAGCTGATTACGTGACCCCACCCACTTCTGCAGAT 

CCTAGCTGAGCAGGACTTGCTGGTTTGGACAATGCCCAAGCAGGGAAGAGCCCTCA 

TTCTCTTATCACTGACAGAGGTGAGATGTCCGANTTTGTANGCTGGTGGAGGAGTG 
AGGTGGAGGAGGTATGCCTCT 

Sequence ID 1228 

GTTATTCAGGTATCCATCAAAATTTTATAAGAGGGCCGGAAACATCGGCTCACACC 

TGTAATCCCAGCACTTTGGGAGGCTGAGGCAGGTGGTTCACTTGAGGTCAGGAGTT 

CGAGACCAGCCTGGCCAACATGGCAAAACCCCGTCACTATTAAAAATACAAAACAT 

TAGCTGGGTGTAGTGGCAGGTGCCTGTAATCCCAGCTATTCGGGAGGCCTAGGAAG 

. GAAAATGGCTTGAACCTGGGGGTGGAGGTTGGAGTGAGGCAAGATCACACCACTGC 

ACTCCAGCCTGGGCGACAGAGCGAGACTCCATCTCAAAAGAAGAAAAAAAAAACAA 

CAAAAAAACCTTTATCAGATTATCAGAGGTTATCACTACAGAGGGAGGTAAAATTG 
GAGGGAAAAGGGTACAAATTTATTTCAC 

Sequence ID - 1230 nt . 741 

AAGCAGAANTNTCTCTAAAAACATTATCTCCTTAAAATCTTGAGGTGCATATNAGA 
GCCACAGGCAATCTCTGACATATAAAATTGCAGTACAGGCCTTTCAAATTTGGCAT 
TTCACTGGTACAATACAACAACCAAGATATATAATAACTGTACAGTGCCTAGACAT 
TCCAGTAAGAACCATTATTTTCTTTAATGTAGAATGATTAATACATATTCTACAAG 
GGGCAGTAAGGTTAGTAATTCTATAGGGTATGTCCCGACATAATTTTCAAATTGTA 
CAATAACACAAACAACTTTGTTAAGGCCATGTTTTATTTGCTGATTAATGGACAAA 
AGGCAATGTAATTTATTTTCAAGTATTTTCTTGAAAGTCTGTGCTCATAAAAATCA 
TGAAAAGTTGGAAAGACTGTTAAATCACTGAAACTTCAAATATATCTTACACAATC 
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TTGTTTGTACAAAAATACAAGTTAAATATAAACATAAAGCAATCATGGTAATTTTA 

TGCAAATCTGTTTTATGTGATCATCAGTTATATATAAAAGTTTCTCAGTTCTGTTA 

TTTGTGAAAAGATCAATACCAGATTGAATGACTACCTATTGGCAAAGGGCCCTAAA 

AAGCTTACTTTAGCACTCATCTTTTACATGGTTAAATGCATTTCCTAATTTGAGAT 

CACCTAAACACTGGAAAAGAAAAAAAATGAAAGGGCAGTATGTCCATAAACCAACA 
AATAATTTGGCTG 

Sequence ID - 1231 nt : 203 

TTGAGGAAGGGTCTACTGTCTTTTTAAATGGCACAATTTTAAGAGGTTTGAGAGGT 
ACAGTCCCTTAACCTGCCACGGGAGAGGGGCCCCCAAACTTTCTTCCCCCCACACT 
TCTGGTTTTCTGTGTGGAGGGGGAGCAGGGATATCTAAGCTGTGGTGTGAAAGGGT 
AGGAGAGATGCTGGAGGTGGGGGTGCTGTGTTCTA 

Sequence ID 123 9 

TTTCCTCGGGAAGCGCGCCATTGTGTTGGTACCCGGGAATTCGCGGCCGCGTCGAC 

TACAATy^ATATAGCAATACAGNGAACTTCACCAAATCCTAAATATTCAGTACCTGA 

ACTGGCTACAACACCGNGTGCACACCCAGTTCCTGCAGAATCTCTTGCAGATATGG 

GAGAGTCAGCCAGTGAAAAGATCCATTTCTTGGGAATCCTTGTCAACAAGACCAGT 

TCAGAAATCCAGGATATATAGAAGCCTACTGTAATTTAAAAACAGTAACAAAAACC 

CCAACAAAACCCAAATCAACAAAGACCAAGATAAAGGNGTGATAAACATTAATTGT 

AATGGTTTTCCTTTACATGCAATACATGCATTTTAAAATCACTAAGAAACACGAAA 

TTTTGTAGAGCAAAGTTTGNGTTTCACGTAAGTGCAAATGAATATATATTTTATTT 

TTTATACTATTAAATTATATATATTTTTTCCATACAAAAGCACACAGTGTTAATCT 

ATAAAATGACATCCAAGTGGATGATGATTGTTTTTGCATGTCCCCCTGCTTAGATT 

TTTTTAAAATATATAGTCAAAAATTAACATCCTTCTTTAAAAATACAGAAGGGAAA 

AANGGGCAAAAAAAAAAATCTAGACTCGAGCAAGCTTATGCATGCATGCGGCCGCA 

ATTCGANCTCGGNCGACTTGGCCAATTCGCCCTATAGNGAGTCGNATTACAATTCA 

CTGGGCCGNCGNTTTACAACGTCGNGACTGGGAAAACCCTGGCGTTACCCNNCTNA 

TCGNCTTGNAACAATNCCCNTTTNGCCAGNGGGG 

Sequence ID 1255 

TCACTTCGTATNGAANCTGTTTGGACTTGCTCAAGAAGACCTTATCTTAACAAAAA 
GTAACTTATAGAAAAGGGAGACATTCATTTAACTTCAAGCCCATATTATTCTTAAA 
AGCTGACTCTTGAAATAGTATTTATTGAGTCATAGTGGAGTCATGGGACTTTTTAA 
GGGCCGGAAGGGACTATTTAGATCATCCAGTCCCACCCTGTCATTTTATGGAGGAG 
GAAACTGAGGCCTAGATAAGATAACCAGTTAGTGGGTCCACTGACCTTTAGGACAG 
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TAGTCTATCCGTAAGAGACAACATGGAGAAAGAAATACAACGTTTTTATAGTGAAT 

TATCATCTTACAAAGAATATTCTTCCCATATCGCACTTTTAAAAAGTGGGTACCTT 

AGTCAAATAGGAGAAAAAACCACTTGAGTAGTTTCATCCTCAGGTTTTAGGTGAGG 

AAACTGATACTCAGATTAAATAACTTTAAGCACACAGAGCCTGAATGATAGTCTTA 

i TTTGAGCTCATCTGTGCTTTTAATGTGTACTACGTTAGGTGTTTTCACTTGCATTT 

CCTTTAGTCTTATTTGAGCTCATCTGTGCTTTTAATGTGTACTACGTTAGGTGTTT 

TCACTTGCATTTCCTTGTTTGACGTTGACAATAAATCGTGAAGCTGCCTTATCTAA 

GNAGTCCTAAAGTAAATCATTGGAACACATGTANCCAGTTTGTTGTTTTTATTTGC 

CAGGTNTCAAATATAACTGAAAACCCATGCTAACTGACTNATTTTAAAAGNTGTNT 

GGGGCATGAAANGATTGCTCTGCCTGGGCGGGNGGTTNANCCTGNGTCCCCCNTTT 

NGGAGNCCACCCANGANGCGATATTTNAGGGNNGATTCNAAACCCCTGGCACGNGN 
NAACCCCNTTTTTAAANANAAAANAWCGGNNG 

Sequence 1256 

TTGTGTTGGTACCCGGGAATTCGCGGCCGCGTCGACGGAGTTTTACCTTATTACAC 

TTTAATCTCTGGATTTACCCCATCTCATTTCTCTTTTAGGAAAACTGTTTGTATGT 

GGTGGCTTTGATGGTTCTCATGCCATCAGTTGTGTGGAAATGTATGATCCAACTAG 

AAATGAATGGAAGATGATGGGAAATATGACTTCACCAAGGAGCAATGCTGGGATTG 

CAACTGTAGGGAACACCATTTATGCAGTGGGAGGATTCGATGGCAATGAATTTCTG 

AATACGGTGGAAGTCTATAACCTTGAGTCAAATGAATGGAGCCCCTATACAAAGAT 

TTTCCAGTTTTAACAAATTTAAGACCCTCTCAAACTAACAGGCTTAGTGATGTAAT 

TATGGTTAGCAGAGGTACACTTGTGAATAAAGAGGGTGGGTGGGTATAGATGTTGC 

TAACAGCAACACAAAGCTTTTGCATATTGCATACTATTAAACATGCTGTACATACT 

TTTTGGGTTTATTTGGAAAGGAATGCAAAGATGAAGGTCTGTTTTGTGTACTTTTA 

AGACTTTGGTTATTTTACTTTTTGGAAAAGAATAAACCAAGAATTGATTGGGCACA 

TCATTTCAAGAAGTCCCCTCTCCTCCACATTTGTTTTGCCAATTTGCACATTAAAT 

GACTCTTCCCTCAAATGTGTACTATGGGGTAAAAGGGGTAGGGNTTAAANATGTAA 

ACAGTTGGGTTTTTTAAGGGNCCTTTTTCATAACTGGAACACTCTNTACAAGGNTN 

CTTNTTAAATAAATAACTTGACTTTTTTGTTTTNTAAANGNANCTTCNTGCTTCCA 

TAAAAAAAAAAATTTAANTNGNC^STCTNTGCTGCTGCGNCCANTTNGCTNGNCCNT 

GGCATTCCCTAGGGANGNTNAATANTGGCNmTTAACNNGGCNGNAACNNNNNCCA 
NT 

Sequence ID 1331 

GGGCGATGCATGCTTTATTAAGGCTCTTGTTTCACCTGGCAGTGTACTGTATCAAC 
GTATAATACAGAAAAAAAATCTCTTTAAGGTCCTCCTTCACAAAGACATAGAGTGA 
AACTCCCTTTACATGTCAGTATTTGTTCAACACTTTAGGCAACTTGACTGTCAGTG 
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TTAAAATGGAAAACAGGAAAATGGAAAAATCTGACCAATTCTGCCACCTTGAGACT 
TTCATATAGACCTTGCACAACT^TTGTATAGATCACACACCGGCTGTATTTAATAT 
GTAACATTTTCACACATATTAAAGATAC^GAAGTATTAAAAAACCCCCAATGTTAA 
TGTATTTGCTTAAAAGGCACAAGTTTCACATATCTGTCTAGCTATCTGTTGGTAAT 
5 ACAGAAAGTATACTACTTTTTTAAAAAAGTGGGCAGAATTCTTGTGTATGTATATT 
TGTGTGTACAGTATGTGTATGTGTGTATATATATATATTATATATATAGATAATAT 
ATAAATATTTTTTTTAAGGAGAAACTAGAATGTTTAGCTAGAAAATTCCACAGCCT 
GTGAAGAAATATTTCAZ^AATGGCCATAAAGGAGGTAAAAATGAAAACCATAACCTA 
ACTTTTATAGAGGCTTTATCTTTAATTTAACGATGTGCGGAGGACTTTCTTGCTTG 
10 AATCTGTTCCGGC^CTGTCTGCTCTGTCCATCAAATGGGCAGGTCTGGGAATGAGGC 
ACCTTCGGCCGTTCAGAAGTGGCCTGAACAGAATGCTGGAACCCAGGCTGGACTCG 
GAC 

Sequence ID 1332 

1 5 CAAACCTGCATGTTCTGCACATGTATCCAGGAACTTAAAAAAAAAAAAAGATAGTT 
TGTGTGTCTTAATTGAATAATAGTAGATTTATAGATTAAAGATCTATGGGTTTTTA 
ATATGGATTAGAAATCTGTGGGTTTTTGATATGGATTAGAAATCTGTGGGTTTTTA 
ATATGGATTGGAAATCTGTGGGTTTTTAATATGGATTAAAAAACATCTGTGGGTTT 
TTAATATGGATTAAACATCTGTGGGTTTTTAATATGGATTAAACATCTGGGTTTTT 

20 AATATGGATTAAACATCTGTGGGTTTTTAATATGGGTTAAAAATCAAAAGAAAATG 
AACTATTTGCTCCAGTGCAGGAAT^ATACAGGCAATACTGGATACAATTAGATGGTC 
AGGAGCGATAACCCGGTTGCCATTGTTTGAAGAAGAGAATAAGGTGCTAGCATTCC 
TATCCGTAGATAATTTGACAGCTAGGAAATAGGGGGAGTCTTCTATGTAGTTAGTG 
AAGGCTAAATGAACTATTATATGCAGTTATCGTAGAAGAGTACTCAAAAAAATCTG 

2 5 TAAAAAATAAAGAAAGGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTG 

GGAGGCCGAGGCGGGTGGATCATGAGGTCAGGAGATCGAGACCATCCTGGCTACCA 
NGGTGAAACCCCCGTCT 

Sequence ID 1335 

3 0 CAAGACTCCATCTCAAAAAAAAAAAAAAATCTACAGTGCTGAGTATATAAAATTAT 

TAACACATTTCACAACAATATGTGTTTGTGGAGTTAAATATTTTTTGTCTTTAAAA 
CAGGTAATTTTAGTGCATACTTAATTTGATGATTAAATATGGTAGT^ATTAAGCATT 
TTAAATGTTAATGTTTGTTACATTGTTCAAGAAATAAGTAGAAATATATTCCTTTG 
TTTTTTATTTAAATTTTTGTTCCTCTGTAAACTAAAAGAACACGAAGTAATTGGTC 
3 5 ACAATTACTGGTGTTTAACTGCCAAATATGGGTAAATAAGGGAAAATTTTGTTTAA 
TATTTAGTCCTTCTGAGATGGCTTGAATATTTGAATTTTGTTGTACGTCTATACTG 
GGTAGTCACAAGTCTTATAAACACTTTAGAGGAAAGATGGATTTCAGTCTGTATTT 
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TTAAACATCATTTATTTTAAATCTGGTGCTGAAAAATAAGAAAAAAATTAAACTGC 

ATTCTGCTGTTCTTCTTTAGAAGCATTCCTGCGTAAATACTGCTGTAATACTGTCA 

TGCAAAGTGTATCCTTTCTTGTCGTATCCTTTTTGGGGCAGTGTTTTTTTGTTTTT 

TTCCTAGAAATGTTTGTCCTTCCCCCACCTGTTGATCCAGGTTAAGGAATACTTTT 
TTACACTTTATTCAAA 

» 

Sequence ID 1336 

CTTTTCCTCCCGCTGTCCCCCACGGAGGGGACTGCTCTCCCCCGCTGCATCCTTTC 

TGTGAGGTACCTTACCCACCTCAGCACCTGAGAGGGTGAAATAGAATTCTAACCTC 

GACATTCGGGAAGTGTTTTTGAGAAGTCTCGGTCGGTAAGGGAAGTCTTCCAAGTC 

CGTGCAGCACTAACGTATTGGCACCTGCCTCCTCTTCGGCCACCCCCCAGATGAGG 

CAGCTGTGACTGTGTCAAGGGAAGCCACGACTCTGACCATAGTCTTCTCTCAGCTT 

CCACTGCCGTCTCCACAGGAAACCCAGAAGTTCTGTGAACAAGTCCATGCTGCCAT 

CAAGGCATTTATTGCAGTGTACTATTTGCTTCCAAAGGATCAGGCCCTGAGAACAA 

TGACCTTATTTCCTACAACAGTGTCTGGGTTGCGTGCCAGCAGATGCCTCAGATAC 

CAAGAGATAACAAAGCTGCAGCTCTTTTGATGCTGACCAAGAATGTGGATTTTGTG 

AAGGATGCACATGAAGAAATGGAGCAGGCTGTGGAAGAATGTGACCCTTACTCTGG 

CCTCTTGAATGATACTGAGGAGAACAACTCTGACAACCACAATCATGAGGATGATG 

TGTTGGGGTTTCCCAGCAATCAGGACTTGTATTGGTCAGAGGACGATCAAGAGCTC 

ATAATCCCATGCCTTGCGCTGGTGAGAGCATCCAAAGCCTGCCTGAAGAAAA 

Sequence ID 133 7 

CAAGAACTCTGGGACATTTGCAAAGGGTATGGCATATGTGTAATGGGAATACCAGA 
GGAGAGGAAAGACAGGAAGTCAAAAAAAGAATTTTTCCAAATTAATGATAGGTTCC 
AAACCA.CAGATGCAGGAAGCTTAAACACCAACAGGATAAATAAAACAAAATCTACG 
CTTAAGCATATCATACTTAACCTGCAGAAAATTACAGACAAAGAAAAAACACCAGA 
GGGGAAGCTGGCAGAAACATACCACCTATAGCGGAAGAAGAATAAGAATTACATCA 
GACTTCCCTTCAGAAATCTTGCAAACAAAAAGATGTAGCACAATATTTAAAGTATT 
AAAGGAGGCCGGGCCCGGTGGCTCGGGCCTGTAATCCTAACACTTTGGGAGGCTGA 
GGCAGGAGGACCATGAGGTCAGGAGATCGAGACCATCCTGGTGATGGTGATACCCC 
ATCTCTACTAAAAATACAAAAAATTAACCGGGCATGGTGACACGCACCTGTAATCC 
CAGCTACTTGGGAGGCTGAAGCAGGAGAATCGTTTGAGCCCAGGAGGTGGAGGTTG 
CAGTGAGCCGAGATCACATCACTGCACGCCTGGGCAACAGAGCGAGACTCCATCTC 
AAAAAA 

Sequence ID 1338 

CGACCCGTTTTAGTCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCTGCCTC 
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GGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCGTAAATCAG 
GTTTTTTAAATGTTTGCCAAACCTTATCACTGACTTTTATAACAAAATTATTTACT 
ATAATCATTAGGGAATATTTAAGTTCTGCTAATACTTAAAATTGCAGAGTGCTAAA 
ACCAGCAGTGAGTTTAGAATCAAGCTAAGCTTTATTGTTGCTACTATTTGAGGCAT 
5 ATTAGTTGACTGGTGTTCATATGCAAGGCAGTCTACTGGGTGCAACAAGGGTTAGA 
AGGATATTTTTAAAAAACTGACCCTATTCTCAGGATGAAAATAATACACTAGTAAT 
AGTCTGCTCTGTTGGTTAACTCCTCGTAAGGAGGTACAATTAAAATGCTGTAGTGT 
TGCAAGGGAAGGAGAGGAAGAATCATATTCCTTCACTAGCAGGATCAAGAAAGCTT 
TTATAGAAATATACAAAATCTTCACTTCTTGAAGGATTGGTAAAATTTAATAGCCA 
10 ACATTGGGCACT^ATTCATTCTCTGAGTAAATATTTATTGCATGCTTATCTTGTAT 
CAAGCATTGTGATGAAAGCACAAGAATGAAAGAGGAGGGAGAATGTTTAGAGAATA 
AGGGCTGAAACACAGATTTTGTAGGGAGCGTAGGGGAGACTGANAAGACAGGTTCA 
GGTTAGTAAGGGCGCTCATATTTTGACCCTGAATGTTAACTATGTGCACATCATGC 
TAGCTATTCTAAATCAGGCATTTTCAAATGGAAGCAGGCACTGACATTTT 

15 

Sequence ID 1344 

CGTGAAGGGTCTTTATGTATTAGTATTAGAGTGATCTTTTGATTATTTTCCTCACT 
ATAAGGAAATTATTTCCTCAGGATGAGCTGCCATAACATTCCACTGTCTGATGGCA 
ATTTTAAAGCCTGAAATTGAAGCCCATGGCTAGGCTATGAGAACCCTAGTTCGTAT 
2 0- AGTAAAGTTGATATCTTCTGGATGTATACTAATTTTAGGCTTTATTTTAAAACTGC 
TGGAAACTGAAACTTAGACAAAAGTATTTTCAGGACATCATTTACAATGTTTAGCC 
CTAAAGAGTCAAGCTGTGGGATTCTGAGTCTTTCATATGTTACAGCAGAAACTTAA 
AAGCAAGAGGAAATTGGCTGGGCACAGTGGCTCTGTAATCCCAGCACTTTGGGAGG 
CTGAGGTGGGTGGATCATGAGGTCAAGAGATTGAGACCATCCTAGCCAACATGGTG 

2 5 AAACCCC^TCTCTACTAAAAATACAAAAATTAGCTGGGCGTGGTGGCACACGCCTG 

TAATCCCAGCTAGTCAGGAGGCTGAGGCAGGAGAATATCTTGAACTTGGGAGGCAG 
AGGTTGCAGTGAGCCAAGATTACATCACTGCACTCCAGCCTGGTGACAGAGCGAGA 
CTCCGACT 

3 0 Sequence ID 1348 

CTGAAACTGCACTGAACCCACAGGTAGGTTACATCACAGGACAGAAATCTGAGGAG 
CTGGAGAAAGCAAAAGAATAAAGGATGGGCTGACACCAGAAGGAATTAAAGGAATT 
TTTATACTGAACTTCAATTACTTGTTCATTTGAAGTTTGTTTTTTTAATGAACGTT 
TTTGCTGTTACTTAAATATAGTGTTTTGAAAGTGTTTCAAATGTATTCAAGTTGGG 
3 5 ATTTTCCATATTTTACTACAGTTCTGTCTTAGTATGTTCACCATAAAACACTTATC 
ATTAAAGCTCACAAAGTGCTTTTTTGTAATATGAGGATAAAATGAAGCCATATAAG 
AATTTTTTTATATCTGTACATTTAACCCACATTTGAGCTTTAGCCAAAATATATAG 
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CTTTTTTTTTTCTGACCTGGCCAACGTATTATCCAGCAAACATCAACTGAAGCAAT 

ATGGAAACACTTCCAAATGTTTGCCAATAATGCTATTAA.GTGACTGATGTCAACAT 

TAGTTACATGGCAAACTAAAGAGGCATTATACATTTTTAAAACACACTAACATATA 

ACTGTAGATAATGTAAGGTTTATTTATATGCATATTTCATAGTATATTTAAATGTT 

TAAATATAAAAAAGGGTTTTTAAACACTTTTAATTTTTATCTTTGATTTTTTTTAT 

TGATATCTCTTTCCAGGCTACTAATAAAATTGCCAGAACTAAACTATCAGGTAAAG 

GTTAAGGCATCAATTGACAAGTAAGTTTTCTAATTTCGTTTTGAATTACAATTCCA 

AATGTAAGACTTTTAAAAATGAATGGCCTTTATTTTATAGAATAATTTTGACCTTT 

TAAATTTACTTATCTAACATTATATAATGAATGTACTTCAAATATTTGACTTTGAA 

. GTCAACATTAACAAATTCATGGATCCTAATTAAAATTTACTATAAAACTGGAATCA 
- TTTATTACTTCCTi;' 

Sequence ID 1351 

TTTTTTTTTTTTTAAAAGAGATGGGTTCTCACTATGTTGCCCATAATGTTTATGAG 

ATTAAGTTCATCTTTTTTATCTGAGTAGTATTTTATTGTATGAATATACCACCATT 

TATTTATCTGTTGGTTATTTCCAGTTTTGGGCTATAATCCAAAATGCTTTTTTCAA 

ACAATAGGCTATATATCATTAATGTCCGTTTATCAGCAGTATAAAATATCTTACCA 

TAAATATTAATAAAAGAAGCATTCATATATAAAATATAGATATTTCAAACCCTACA 

GAGGGCCTTTTAATGATTAAATATTTTGTCCTTACAAAAAGGTCCAGGTAATTACA 

CCCATGAGGTTAACCTGCCTTAGTGCAGGACTTAAAATAAGGCTTCTCCTGCCATC 

TCTCTCCATTTGTAGAATGTGAAATTCTTTAAAATGCATCCTATATTAGGAATACT 

ATAGCTGTGCACTGGTGTTTGTTCTCTTCTTTAAACTCGGGACCGTATATATCTGC 

TCAAATTGCCCAAGTATACATATGCTGCACTCCATCAAGTGTCAGGCCACATTCTA 

TCAGCACAGCGTGACTGCCTATCAGTGACAATATAAGTGAGCTCTATTTGGATCCC 

TCTTACCCTACCTTTTATATTTATGACAGCATTATCATAAAACTCCAATATTCTTC 

AATAACTTACATGTTTGTTGTAGGATAAAATTATTACCCTCAATGAACTACAT 

Sequence ID 1352 

ACCAGCTTCTTCACAGGTTCCACGAGTCATGTCAACACAGCGTGTTGCTAACACAT 
CAACACAGACAATGGGTCCACGTCCTGCAGCTGCAGCCGCTGCAGCTACTCCTGCT 
GTCCGCACCGTTCCACAGTATAAATATGCTGCAGGAGTTCGCAATCCTCAGCAACA 
TCTTAATGCACAGCCACAAGTTACAATGCAACAGCCTGCTGTTCATGTACAAGGTC 
AGGAACCTTTGACTGCTTCCATGTTGGCATCTGCCCCTCCTCAAGAGCAAAAGCAA 
ATGTTGGGTGAACGGCTGTTTCCTCTTATTCAAGCCATGCACCCTACTCTTGCTGG 
TAAAATCACTGGCATGTTGTTGGAGATTGATAATTCAGAACTTCTTCATATGCTCG 
AGTCTCCAGAGTCACTCCGTTCTAAGGTTGATGAAGCTGTAGCTGTACTACAAGCC 
CACCAAGCTAAAGAGGCTGCCCAGAAAGCAGTTAACAGTGCCACCGGTGTTCCAAC 
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TGTTTAAAATTGATCAGGGACCATGAAAAGAAACTTGTGCTTCACCGAA.GAAAAAT 
ATCTAAACATCGAAAAACTTAAATAT 

TAAATAAAAAAAGGAAAGGAAACTTTGAACCTTATGTACCGAGCAAATGCCAGGTC 
TAGCAAACATAATGCTAGTCCTAGATTACTTATTGATTTAAAA 

5 

Sequence ID 1353 

ACATTCTGGAAAAGGCAAAAGGGAGGAAGAACTGATTAGTGGTTAGCCCAGGGTTA 
GAGTTGGGGAGAGGATATAATGAGGGAACTTTTGTGGATTCTGTACCATGATTATG 
ATTACACAAACCTATGCATACATTGAAACACATAGAACTATACGTTGAAAAAAGTG 

1 0 AATCTGCCTGTATGTAAATTTAAAAGAAAAATATTTTTTTAAAAAAACAGATGCTT 
CTTAACACATTATCATCTATGTCAGTTTAACAGTTAGTAGACTTAGGCCAGGTGTC 
ATGGCTCACTCCTGTAATCCCAGTGCTTTGGGAGTCTGAGGTGGGACGATCTCTTG 
AGACTAGGAGGGAGTTTGAGACAAACCTAGGCAATGTAATGAGACTCTTTCTCTAC 
AAAAAATTTTAAAGTTATCTGGACATGGTGGTGCCTGCCTGTAGTCCCAGCTACTT 

1 5 GGGAGGCTGAGGTGGGAGGATTCCTTGAGCCCAGAAGTTCAAGGCTACAGTGTGCT 
ATGATAGAGCCACTGCACTCCAGCCTGGGCAACCAGGTGAGACCTTGTCTCTAAAA 
TGAATAAATAAAT 

Sequence ID 13 55 
2 0 TGGTCTTTCACCCAGCCAGGGAGAAGGTTCTTCGCTCAGTATGAAGAAAAGCAACC 
CAAAACTCTCAATCTGATTTGTTTTTGTTTATGTCGATGCCCTGTAGTTTGAAAGT 
GAAGTAAAGATTTAGAATTCACCTAAGTCCAAAGGAAAACACGTGGTTTTTAAAGC 
CATTAGGTAAAAAAAGTTCTCAATAAAGGCATTACAATTTTTTAGGTTTAGAAAGA 
TGGACTTTTCTGATAAATCTTGGCA.GACATCTAAAAAAAAAACCATATTTTTCACA 

2 5 AGAAAATGCAAGTTACTTTTTTTGGAAATAATACTCACTGATTATGGATAAAATGG 

AATATTTTCAGATACTATATTGGCTGTTTCAAAATAGTACTATTCTTTAAACTTGT 
AATTTTTGCTAAGTTATTTGTCTTTGTTGTATCTATAAATATGTAAAAAATATTTA 

3 0 Sequence ID 1359 

CGGGATCCCTAGTATAAGACATTCAGTGTTCCCCTTTCAGTCTTACTACTTTGACC 
GCGATGATGTGGCTTTGAAGAACTTTGCCAAATACTTTCTTCACCAATCTCATGAG 
GAGAGGGAACATGCTGAGAAACTGATGAAGCTGCAGAACCAACGAGGTGGCCGAAT 
CTTCCTTCAGGATATCAAGAAACCAGACTGTGATGACTGGGAGAGCGGGCTGAATG 
35 CAATGGAGTGTGCATTACATTTGGAAAAAATGTGAATCAGTCACTACTGGAACTGC 
ACAAACTGGCCACTGACAAAAATGACCCCCATGTGAGTATTGGAACCCCAGGAAAT 
AAATGGAGGAAATCATTTGCCTTAGGGATTGGGAAAGCTGCCCACTAACTGTCTTC 
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CCCATTGTTTTGCAGTTGTGTGACTTCATTC5AGACACATTACCTGAATGAGCAGGT 

GAAAGCCATCAAAGAATTGGGTGACCACGTGACCAACTTGCGCAAGATGGGAGCGC 

CCGAATCTGGCTTGGCGGAATATCTCTTTGACAAGCACACCCTGGGAGACAGTGAT 

AATGAAAGCTAAGCCTCGGGCTAATTTCCCCATAGCCGTGGGGTGACTTCCCTGGT 

CACCAAGGCAGTGCATGCATGTTGGGGTTTCCTTTACCTTTTCTATAAGTTGTACC 

AAAACATCCACTTAAGTTCTTTGATTTGTACCATTCCTTCAAATAAAGAAATTTGG 
TACC 

Sequence ID 1360 

TGCGCAGACCAGACTTCGCTCGTACTCGTGCGCCTCGCTTCGCTTTTCCTCCGCAA 

CCATGTCTGACAAACCCGATATGGCTGAGATCGAGAAATTCGATAAGTCGAAACTG 

AAGAAGACAGAGACGCAAGAGAAAAATCCACTGCCTTCCAAAGAAACGATTGAACA 

GGAGAAGCAAGCAGGCGAATCGTAATGAGGCGTGCGCCGCCAATATGCACTGTACA 

TTCCACAAGCATTGCCTTCTTATTTTACTTCTTTTAGCTGTTTAACTTTGTAAGAT 

GCAAAGAGGTTGGATCAAGTTTAAATGACTGTGCTGCCCCTTTCACATCAAAGAAC 

TACTGACAACGAAGGCCGCGCCTGCCTTTCCCATCTGTCTATCTATCTGGCTGGCA 

GGGAAGGAAAGAACTTGCATGTTGGTGAAGGAAGAAGTGGGGTGGAAGAAGTGGGG 
TGGGACGACAGTGAAAT 

Sequence ID 1361 

TATAAATACACTCCGGGATGATTTACCCCCGGAGGTCAGCTAGTAAAATACATGAG 

TAGAATTCCTTAAAGTATGTGATAATTGCTCATCACTATCCAAGTGTGACATAAAT 

CATAAAAAGAATTGACAAAATCAGGGTCGCAAAGAGAATTGAAAAAAATCTGTCAC 

AACCAAAATTTAAATTGACCTCTGTCCTAGAGTATGAGAGCCACACTGAACAGAAA 

AACCAGATAAATCTTTTATAAAATATTCATTTGCAGCCCCATTAACGTTGCTTGTC 

ACCCCACCTCCCCATGTCCTTGGACAAACTGAATGTATAGTAACATCATCCCAGGC 

CAGGCGCGGTGGCTCATGCCTGTAATCCCAGCACTTTGTGAGGCTAAGGCAGGCAG 

ATCAGGAGGTCAGGAGTT<^.GGACCAGCCTGGCCAAAAAGGTGAAACTCCGTCTCT 

ACTAACAATACAAAAATTAGCTGGGTGCGGTAGTAGGCGCCTGTAATCCCAGCTAC 

TCGGGAGGCTGAGGCAGGAGAATTGCTCAAACCCGGAAGGTGGAGGTTGCAGTGAG 

CTGAGATCGTGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACTCTGTCTCGGG 

GAGGGGGGTGGCGGAGATAAAGAAATAACATCATCTTATACTGTCAAGCTCAAGGT 

GTCTGCAGCCTTATCTTCAGGGGAAGTTGTGTCTTTCTCAGGGAAGATACAGATTT 

CAATTTAGAGCAAGACAGAGAGAAGTTACATTCAGAGAGGAAAATGCAGTAGTCTA 
ACTG 
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Sequence ID 1364 

GCGGCCGCGCTCTTTTCAATTTTTAAAAAGAAGTTTGTTTTCCATTTCAGTAATTT 
CTGCTTTGATCTTCCTTATGTCCTCCTATTGAGTTGATCAGCTTTCTTTATTCTTG 
CCTTTTCTCCTCTGTGTGCCCTTTCTATTAACGTATTTACCCTTAGGCTGGGCACA 
5 ATGGCTGATGCCTGTAATCCCTGCACTTTGGGAGGCCGAGGCAGGTGGATCACCTA 
AGGTCAGGAGTTCAAGACCAGCCTGGCCAACATGGTGAAACCTGGTCTCTACTAAA 
AACACAAAAATTAGCCAGGCATGGTGGTGTGCACCTGTAATCCCAGCTACTCAGGA 
GGCTGAGGCAGGAGAATTGCTTGAACCTGGGAGGCGGAGATTGTGCCAAAGCACTC 
CAGCCTGGGCAACAAAATGAGACTTTGTGTC 

10 

Sequence ID 13 65 

CACCAGGCTGTCTTCAGATACTTCATACAGAAATGAGCCTCCCTGTGGGGTCCTCT 
TCCCTCCTTCAGCCTGTCCATCAACACAGCATTGCGGGATCCTTACCATGGCATCC 
AGCCCTGGAGATGCTTCAGGAAAGTTGCAGGTCCATGCTGCAGGACAGGCTCAGAT 
1 5 CAGCAGAGACGCATCTCACATCGGGCTGTGAAATTCAAGTTGAGCTGCAATTGGCA 
ATGAGAA 

Sequence ID 1366 

GTTATTCACTGAGACCGTGCCCCGGTTATGAGGTTGTACCAGAAAGCAAGTATTCA 
2 0 CTATGCACACTATTCACCGCTCACCCTAGCATTGAAGCCAGCCTGTAGCCTGAAAG 
CCTTTGCTTTGAGGGCAGGTCTTTCCCCAAAATGCAGACACGAAGGTGCAAAGTGA 
AGCTGCCAGTCTTGCAAAAGATGTAACTTGTCACGAAGGCCACGAGTGGCAGGGAG 
AGCTGTCCCACATTTGCGGAAGTGGCTATGTGAGGACGGGGGAGGCGGGTCCCTTA 
GAGATGAGACAATCATAAGGGGAGATATCAGAGAAAATCGTAAGGGGAGCAGATGG 

2 5 TTGTCAAGAGAATAGGCTGACCATCGAAGGACTGGCAGAAGCTTTCAGAAAACCAC 

TGGACGGCTGGGCACAGTGGCTTAGGCCTGTAATCCCAGCACTTTGGGAGGCTGAC 
GCAGGTGAATCACTTGAGGTCAGGAGTTCCAGACC^^ 

CCCCATCTCTAC^GAAAATATAAAAATTAGCCAGGCGTGGTGGCACAAGCCTAGAA 
TCCCAGCTACTTGGGAGGCTGAGGCAGGCGAATGGCTTGAACCCAGGAGTCAGAGG 

3 0 CTGCAGTGAGTCGAGATTGTTCCACTGCACTCCAGCCTGGGTGACAGTGCAAGACT 

CCTTCCAAAAAAAAA 

Sequence ID 1367 

TTCGTGAGTGATGGCGTCCCGGGTTGCTTGCCGGTGCTGGCCGCCGCCGGGAGAGC 
3 5 CCGGGGCAGAGCAGAGGTGCTCATCAGCACTGTAGGCCCGGAAGATTGTGTGGTCC 
CGTTCCTGACCCGGCCTAAGGTCCCTGTCTTGCAGCTGGATAGCGGCAACTACCTC 
TTCTCCACTAGTGCAATCTGCCGATATTTTTTTTTGTTATCTGGCTGGGAGCAAGA _ 
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TGACCTCACTAACCAGTGGCTGGAATGGGAAGCGACAGAGCTGCAGCCAGCTTTGT 

CTGCTGCCCTGTACTATTTAGTGGTCCAAGGCAAGAAGGGGGAAGATGTTCTTGGT 

TCAGTGCGGAGAGCCCTGACTCACATTGACCACAGCTTGAGTCGTCAGAACTGTCC 

TTTCCTGGCTGGGGAGACAGAATCTCTAGCCGACATTGTTTTGTGGGGAGCCCTAT 

ACCCATTACTGCAAGATCCCGCCTACCTCCCTGAGGAGCTGAGTGCCCTGCACAGC 

TGGTTCCAGACACTGAGTACCCAGGAACCATGTCAGCGAGCTGCAGAGACTGTACT 

GAAACAGCAAGGTGTCCTGGCTCTCCGGCCTTACCTCCAAAAGCAGCCCCAGCCCA 

GCCCCGCTGAGGGAAGGGCTGTCACCAATGAGCCTGAGGAGGAGGAGCTGGCTACC 

CTATCTGAGGAGGAGATTGCTATGGCTGTTACTGCTTGGGAGAANGGCCTAGAAAG 

TTTTGCCCCCGGTGCGGCCCCAGCANAATCCAGTGTTGCCTGTGGCTGGAGAAAGG 

AATGTGCTCATCACCAGTGCCCTCCNTTACGTCAACAATGTCCCCCACCTTGGGAA 
CATCATTGGTTGTGTGCTCAGTGCCCGATGTCTT 

Sequence ID 13 68 

CAGTGAGCCAAGATCACACCACTGCACTCCAGCCTGGACAACAGAACGAGACTCCA 

TATCAAAAAAATTAAATTAAAATATAATAAATTTCTTGCCGGGCGCAGTGGCTCAC 

ACCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAAGTCAGGAGA 

TTGAGACCATCCTGGCTAATACAGTGAAACCCCGTCTCTACTATAAATACAAAAAA 

TTAGCTGGGCATGGTGGCGGGCGTCTGTAGTCCCAGCTACTCAGGAGTCTGAGGCA 

GGAGAATGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGTGCCACT 

GCAATCCAGCCTGGGCAGCAGAACGAGACTCCATCTCAAATAAATAAATAAATAAA 

ATGAATTTCAGCTAGAAGAGCCTTATTCCATTTTCCTTTTTATTAAACATCTGGCA 

"TAAGTTGGTAAGTATGTGAAGTTTATCATATATTCTTATGCGAATTATTATTTTCG 

CCTTTTTTTTTATAATTCTGTCTGGGATTTGAATAGTAGAGTTTGAATTCAGGAAG 
GACACCTGTGATAGGACAATAAAAT 

Sequence ID 1369 

CTGATTGCAAAAACATTACAACTCAGTACTGCGGCTTTCATTCAAATAGGTTATAT 
GTATAAACTGAGGTTCAACAATATTGTATTTGAGATGGGAAAGTTAAAGAAATGCA 
ATAATGTAAATAATACTTAAGAAAATAAGATCTCAGGAAACTGTGTATACTCTGTA 
CTTTTATGCAACTTTATCAGATCATTTCAGTATATGCATCAAGGATATAGTGTATA 
TGACATGAACTTTGAGTGCAAAAACTGTACTATGTACCTTTTGTTTATTTTGCTGT 

Sequence ID 1370 

CGAAAGGACTACAGAGCCCCGAATTAATACCAATAGAAGGGCAATGCTTTTAGATT 
AAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTTAAAAGTTGTAGGTGATTAA 
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AATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAACCGAAGGTGATTAAAAGACC 
TTGAAATCCATGACGCAGGGAGAATTGCGTCATTTAAAGCCTAGTTAACGCATTTA 
CTAAACGCAGACGAAAATGGAAAGATTAATTGGGAGTGGTAGGATGAAACAATTTG 
GAGAAGATAGAAGTTTGAAGTGGAAAACTGGAAGACAGAAGTACGGGAAGGCGAAG 
5 AAAAGAATAGATAAGATAGGGAAATTAGAAGATAAAAACATACTTTTAGAAGAAAA 
AAGATAAATTTAAACCTGAAAAGTAGGAAG 

Sequence ID 1371 

GTCCAGNAGAAAGTTCAGTGACTTGTCCAGAGCTGCAGGTCTTAAGAGGCTGAAAT 
10 CTCGCCTCTGCCTCGAGGCTGCGGTTCCACTGACCCATACTACTTGCCTTCAGGAA 
AGAGAAATGGTGTAGGAAGGCTGTGGATGAAGACGCTTACATTCATGAAGGATTTG 
GATAGGCGAACATGAGCTTTTCCACCAAATTTCAGAATTTTAAGAAATGCCTTAAA 
TTATTTCTTAAAAATCAATTTGGGGCAGACGAGAAGTTCTGATAATAGTTTTTAGG 
GAACATGATAAAATTCTGACCTTAGAAGTGGTATACCAGTTTGAGAAGAAGAACAA- 
1 5 GCTATAAACGGTGTAGATAACATTCACGGCTATTTAAGAAAGAGTTACTAAGGGAA 
ACCAGAATGACTTAAGAGTGTTACTCTTCTTTTTCTGAGAGAACAATAGCATCATC 
TCAGAAAGCCTTTCATGCCATTAATAGGTAAGAATCTGGGCTTCTTGGACCATGGG 
TTAGACTTTCTTACAAAACCATAATATGCATTTCCTAGCAAAATTTATGCTATTAC 
ATTTCCTTATCTCAACAAAGACTGGTAAATTCAGTACTTATTCCTCAATTTTCCTA 

2 0 CCCTTAAAATGGGGATATTCTGCCTCTCCAAGGAATGCTGGGAACAAGCAAGTCCT 

CATGTTAGGGGTCTTTGAGTTTTCATGGAAGTTTAGGTTATTTATATGATGACATA 
GTTGTCAACTTACTTTCAGGATGGACTTTTCTTTTGTGAGTTTGTGACCTAAATAC 
AATAGTTGTTATGCATGTCCAGTTTATGGAAGTACCACTGCAATANCAG 

25 Sequence ID 1372 

CAGTGCAGCCAAGTATCACACCACTGCACTCCAGTCCTGGACAACAGAAACGANTA 
CTCCATATCAAAAAAATTAAATTAAANGATAATAAATTTCTTGCCGGGCGCAGTGG 
CTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAAGTCA 
GGAGATTGAGACCATCCTGGCTAATACAGTGAAATCCCCGTCTCTACTATAAATAC 

3 0 AAAAAATTAGCTGGGCATGGTGGCGGGCGTCTGTAGTCCCAGCTACTCAGGAGTCT 

GAGGCAGGAGAATGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGT 
GCCACTGCAATCCAGCCTGGGCAGCAGAACGAGACTCCATCTCAAATAAATAAATA 
AATAAAATGAATTTCAGCTAGAAGAGCCTTATTCCATTTTCCTTTTTATTAAACAT 
CTGGCATAAGTTGGTAAGTATGTGAAGTTTATCATATATTCTTATGCGAATTATTA 
3 5 TTTTCGCCTTTTTTTTTATAATTCTGTCTGGGATTTGAATAGTAGAGTTTGAATTC 
AGGAAGGACACCTGTGATAGGACAATAAAATCTA 
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Sequence ID 13 74 

GAAAGCACATATGATATACATGTGTGTCATATGTATTATTTTGTTTGCCATCTGAG 
TCTTCAAAATTTGTTACAGAATACCTGCATATTAATATTTCAAGGTATGGATTAAT 

Sequence ID 1378 

CTGAGTATTAACTAAAAAAAAAAAAAAAAAAAAAAAAAAA 
Sequence ID 1380 

CCAAACCCAACTGGTCCAGTAGGATACTCACCTTACAGGGGGCGTCTCAAGAGTCT 

CACAGTTCCCTTGGGTCTTAAGAGACTCACTGTTGGACCAGGCGTGGTGACTCACG 

CCTGTAAAACCAGCACTTTGGGAGGCCGAGGCGGGCGGATCAGTTGAGGTCAAGAG 

TTCAAGACCAGCCTGACCAAGGTGCTGAAACCCCGTCTCTACTAAAAATACAAAAA 

TTAGCCAGGCATGGTGGTGTGCGCCTGTAATCCCAGCTACTCCAGAGGCTGAGGCA 

GGAGAATCTCTTGAACCCAGGAGGTGGAGGTTGCAGTGAGTCGAGATCATGCCACT 

GCACTCCAGCCTGGGTGACAGAGCGAGACTCCGTCTTAGAAAAAAAAAAAAAAAAA 

AAAAGAACCTCACAGTTCAGCAGGGTTCTAGCATGAGACAATGAGGACAAGGGTAG 

GTGAGCAGGTGGAAAGAGTGAGAACAGGTCAATTGTGATGGAGAAAATAATAAAGA 

CAGAAAAGGCAGAAGACTGCCTGGCAGAAGACCTGTCCCAGCAGATACAAAAATAC 

AGACAACAGGAGCCAGCATAGACCCTTGACCTGTGTAAGTCTTTCTCAGGCCTTCT 

TTTAAGTAGAAACATGCCTTTGAAAAAAAGTTTTAATAAACAGGAAAATCATAAAT 

CCCTATTTACATAAATAATATATCCTGGTCTTATTCTTAAAACCATTGATTTTTCA 

CGGCTCATTAANAAAGCTGGGCGAGGTGGCTCACGCCCGTCATCCTAGCACTTTGG 

GAGGCCGAGGCGGGCANATCACAAGGTGAGGAGTTGGGAGACCAGCCTGACCAACA 

CGGTGAAACCCAGTCTCTACTAAAAATACAAAAATTANCTGGGGGTGGTGGTGTGT 

GCCTGTAATCCAAGCTACTCGGGAGGCTGAGGCAGGA 

Sequence ID 13 82 

CTTACTACCTCGAACATGAAACAAGCAGCCCCGCACTTCTCGAAGGTCTGAGTTAC 
TTGGAATCGTTTTACCACATGATGGACAGAAGGAATATTTCAGATATCTCTGAAAA 
CCTCAAGCGTTACCTTCTTCAGTATTTTAAGCCAGTGATTGACAGGCAAAGCTGGA 
GTGACAAGGGCTCAGTCTGGGACAGGATGCTCCGCTCGGCTCTCTTGAAGCTGGCC 
TGTGACCTGAACCATGCTCCTTGCATCCAGAAAGCTGCTGAACTCTTCTCCCAGTG 
GATGGAATCCAGTGGAAAATTAAATATACCAACAGATGTTTTAAAGATTGTGTATT 
CTGTGGGTGCTCAGACAACAGCAGGATGGAATTACCTTTTAGAGCAATATGAACTG 
TCAATGTCAAGTGCTGAACAAAACAAAATTCTGTATGCTTTGTCAACGAGCAAGCA 
TCAGGAAAAGTTACTGAAGTTAATTGAACTAGGAATGGAAGGAAAGGTTATCAAGA 
CACAGAACTTGGCAGCTCTCCTTCATGCGATTGCCAGACGTCCAAAGGGGCAGCAA 
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CTAGCATGGGATTTTGTAAGAGAAAATTGGACCCATCTTCTGAAAAAATTTGACTT 
GGGCTCATATGACATAAGGATGATCATCTCTGGCACAACAGCTCACTTTTCTTCCA 
AGGATAAGTTGCAAGAGGTGAAACTATTTTTTGAATCTCTTGAGGCTCAAGGATCA 
CATCTGGATATTTTTCAAACTGTTCTGGAAACGATAACCAAAAATATAAAATGGCT 
5 GGAGAAGAATCTTCCGACTCTGAGGACTTGGCTAATGGTTAATACTTAAATGGTCA 
ATAGAAAAAGTAGGCTGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGA 



Sequence ID 13 87 




GTTAAAGTAGGTTTGTCGACGCGGCCACGAATTTCCCGGGGACCAA 
Sequence ID 13 89 

1 5 GNTTTTCGGAAACGGAGTCTCGCTTTCTCGCCCACTCTGGAGTGGNGCAGTGGGGN 
GGTCTCAGCTCACCACAGCCTCCACCTCCTGGGCCCAAGCGATCCTNTCACCTCAG 
CCTCCTGCGTAGCTGGGACTACAGGCGTGCACCACCATTCCCAGGTAATTTTTGTA 
TTTTTTGTANANACAGGGTTTCACTGTTGTTGCCCAGGCTGGTCTCGAACTCCTGC 
TTCAGTCTGCCANAATGCTGGATTCTAGGCGTGAGCCACCGNGCCTGGCCCAAAAG 

20 TTACTTTTCTTACAGAAGCAAAGCTTTAATGCATTTTACTGAATGCTTATAGCTTT 
GTAGATACTGAAAAGAGTATGAGCGTCACATACAGACACATNTAACAGCACTGCCT 
CCAACCAGCCCCTACCCACTGGTCAGGNGAGTAANAATCAAAATTCTTTTCTGNGA 
GTGGAACGGAAATTTCATCTCTCCTCCTCAGGCAAGTAGTTAANAGGCTGGNGGGA 
GTCATGGCCCCATTTTGTTCAAAATACAAGCTCCACAGGAACAAAAGGCTGAACTG 

2 5 CTCACCTCCCAACTGATGAACCTCGTC^ 

TACTGCAGCAGAAACTCGAGCTATCAAACCATCAGGCACCAAAAGTAAAACTCCTT 
TCTCTAAAAAGACCTCTCTTTACCTGAGCCTTTCAATGCATCTTTGCCCCCANATA 
ATCCTGGATGAGATAATCCCCAGAGGAANACCAGCGCTTGCCTAGTGAAATTATAC 
TATGAGACAAGGGTAAAAGACCTCAAANACCGGGTTGGCAGGTAAGGGAGTAGGGN 

30 

Sequence ID 1390 

TCNGTGGCACCCGTTTCCGGCACCTTCAGACTCTGAAGAGCCACCTGCGAATCCAC 
ACAGGAGAGAAACCTTACCATGTACGTAAGCCTCTTGAGGCCGCTCTCTGACCTGC 
GGGGATGTGGAGGGCAGGGAAGGAGGTGGAGCGCAGGGAAGGAGGTGGAGCAGGGA 

3 5 GGCAGTGGAACTGTTTGCTCCCATCTCAAGCACAGAGTGGGGCAACCACTACGCTA 

ATGGTTGGAAGACCTAGATCTGGGCCCAATGGCCAGACACCCTGCTTGACCTTGGC 
CCAAGCATTAGGGGACTCATCTTTAAAATGAGGGTATGGGACTAGATGATCTGGGC 
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CTTAGGAGAGGAGT 
Sequence ID 1391 

CGGCTNCTACCCTGCGGAGATCACACTGACCTGGCAGTGGGATGGGGAGGACCAAA 
5 CTCAGGACACCGAGCTTGTGGAGACCAGGCCAGCAGGAGATGGAACCTTCCAGAAG 
TGGGCAGCTGTGGTGGTGCCTTCTGGAGAAGAGCAGAGATACACGTGCCATGTTCA 
GCACGAGGGGCTGCCGGAGCCCCTCACCCTGAGATGGAAGCCGTCTTCCCAGCCCA 
CCATCCCCATCGTGGGCATCGTTGCTGGCCTGGCTGTCCTGGCTGTCCTAGCTGTC 
CTAGGAGCTATGGTGGCTGTTGTGATGTGTAGGAGGAAGAGCTCAGGTGGAAAAGG 

10 AGGGAGCTGCTQTCAGGCTGCGTCCAGCAACAGTGCCCAGGGCTCTGATGAGTCTC 
TCATCGCTTGTAAAGCCTGAGACAGCTGCCTGTGTGGGACTGAGATGCAGGATTTC 
TTCACACCTCTCCTTTGTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCA 
TCTGAATGTGTCTGCGTTCCTGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCA 
CCCCCGTGTCCACCGTGACCCCTGTCCCCACACTGACCTGTGTTCCCTCCCCGATC 

15 ATCTTTCCTGTTCCAGAGAAGTGGGCTGGATGTCTCCATCTCTGTCTCAACTTCAT 
GGTGCGCTGAGCTGCAACTTCTTACTTCCCTAATGAAGTTAAGAACCTGAATATAA 
ATTTGTTTTCTCAAATATTTGCTATGAAGGGTTGATGGATTAATTAAATAAGTCAA 
TTCCTGGAAGTTGAGAGAGCAAATAAAGACCTGAGAACCTTCCANAATCCG 

20 Sequence ID 1392 

" TGAAACAAAATGAATTTNTATGGGTAAGAGAGGGTAATATTTTAGAGTTGTGTTAC 
AAAACTACAAATTTTTATTAAATTAATAAATCAGAATACTAAATCCATGTGTTTTT 
TTCTTTCTTAAAAAATATCTTTTGGCTGGGCACGGTAGCTCATGGCTGTAATCCCA 
GCACTTTGGGAGGCTGAGGTGGGTGGATCGCCTGATGTCAGGAGTTCAAGACCAGC 
25 CTGGTCAACATGTTGAAACCCCATCTCTACTAAAAATATAAAAATTAGCCGGTGTG 
- GTGGTGGGCGCCTGTAATCCCAGCTACTCAGGAGGCTAAGGCAGGAGAATTGCGTG 
AACCCAGGAGTTCAGTGATGTAGCGGGGAGCTGAGATTGTGCCACTACACTCCAGC 
CTGGATGACAGAGTGAGACTCCATCTCAAAAAAAAAAAAAAAAAA 

3 0 Sequence ID 13 94 

GCATAATGTGAGGAGGTGGAGAGACAGCCCACCCCCGTGTCCACCGTGACCCCTGT 
TCCCATGCTGACTTGTGTTTCCTCCCCAGTCATCTTTCCTGTTCCAGAGAGGTGGG 
GCTGGATGTCTCCATCTCTGTCTCAACTTTATGTGCACTGAGCTGCAACTTCTTAC 
TTCCCTACTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTAT 

3 5 GAGAGGTTGATGGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATA 
AAGACCTGAGAACCTTCCAGAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
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Sequence ID 1395 

CTTACCATGTCAGTGCACAGAAATGCTGTCTTGGGATGTAGGAAAAATAAATCCAC 
AAAAGCTACCAAGTTTGAAGGGGACCATGAGTCTTCAGGCTGGAGCTTCCAAACCA 
GATGAAAACCCCACAATTAACCTGCAGTTTAAGATCCAGCAGCTGGCCATTTCTGG 
5 ACTCAAGGTGAATCGTCTGGATATGTATGGAGAAAAGTACAAACCCTTTAAGGGCA 
TAAAATACATGACCAAAGCTGGGAAGTTCCAAGTTCGAACCTGAAGGGAGCATTTG 
CTGAGGGAATAGTCTTGCAC^TTTTTTCATTTCTTACTTGTCTAAAAGTAAAAAAA 
AATATCAGCCTGTCTCCTAGGTCAGTCCCCTCCTGGACCCACCCGCTCCCTTTTTT 
CCTTAGCCTTCAGTGCCATGGAACTAATCAAGGGAGGAAAAGGTCACCAGGGAGAA 
1 0 CTGGACAGAACTGAAACACAGCAACACCAGTTCTCAAGGACAAGGTGTGTGATGGG 
GGTAGGAAGCTTGGTGCTTATGTAACCATTTTAAACGTGGTTTCTATAGGAAAGAC 
CAACATTTGTTTAGCTTGCTTGGCTTTAATTATCTAAAGCCAATGAAAGACTTCTT 



15 Sequence ID 1396 

CAAACACTATGTTATTTTATGAANAAGACTTGAACATCTATGGATTTTGGTATTTG 
CAAGGGGTGAATGGGGTATTTGCAAGCAGTGAATGAGGAGGCCTGGAACCAATCTT 
CTGCTGATATTGAGGCACAACTGAAAAAGGTATATTACTTAAATCTCTTATTGTAT 
TGTAAACTGTATAAGTAATGAAATTAAAAGGCAGAAATTGTCAGACTGAATAAAAT 

2 0 GAAAAGACCAAACAATATGCTGCTTACAAGAAACACAATTCAAATATAAGGACACA 
ATTAGTTTAAAGGAAAAGAACTGGAAAAGATATACCATGATAACACAAGTCAGAAG 
AAAGCTGCTGTGGATATATTAATATGAGATGTAGATTTGAGAGCAGTGAATATTGC 
CAGGCATAAAGAAAGTTATTACATAATAATTAAGGTATCAGTTCATCAAGAAGATG 
TAATAACCCTAAGTATTTATACAACTAATATCAGAGCTTCAAAATACATGAAGCAA 

2 5 AAACCAGTGGAATTGATAGGAGAAACACACAATTACACAATTATAGTCAGAATTTT 

CAACATATCTTTCTCAATGGAGAAAACAACTAGACAGGAAATCATTAAGGATATAG 
ATGATTTAAATTATATGATCAACTACCTGGACGTAATTGGCATTTATGGAACACTG 
CACCACCAACAGCAGAGTACATATTATTTTCAAGTACACAGAAAACAGTTACCAAT 
ATAGACCATTTTCTGGGTCATAAAACACATCTCAATAAATGTAAAACAATTAATGT 

3 0 TATATAAAGTATGTGCTCTGACCNCAAAGGAATTAGAGATCAATAAAAGAACATCT 

TTGAAAAATCTCACNTATTTAAAAACTAATAACTCACTTCTAAATAACTCCTGTNT 
CAAGAGAATNAAANGG 

Sequence ID 13 97 

35 

CCCAGCCTCACTGCGCCCCGTCAGGCCAGGCAGCTGCCCTCAGGGTCTGCCAAGGT 
GGGGGTCAAGGGCCATGGGGGCAGGTAGCTCTGCCTGCAAAGCCCACAAGCATGTC 
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AGATCACCTC3GGCTGCAGACAGACAAACACCTGAGCTGTTCTGAATACCTTCAGGT 

TCCTGGCCTCGCTGAGCAAGTGCAGAAATTTTTACCTTCAAGGATCAGGGTTTTTC 

TGTTTGTTTGTTTTTTAACACACACATATGTGAACAAAGAGTATGCGTTTGTACTG 

GCAGAAGAAGCGTCTGGTAAGACAACCAGCAAGTTAACAATGGTCACCTCCAGAAA 

TGGGCTGGGTAAACCAAAGAATTTTTTTGTTTTTGTTTTTTTTGAGTCAGGGTCTA 

GCTCTGTCACCCAGGCTGGAACGCACTGGTGTGATCACGGCTCACTGCAGCCTTGA 

CCTCCCTGGCTCAAGCAATCCTCCCAGCTCAGCCTCCTGAGTCGTTGGGACTACAG 

GCACGTGCCACCACGCCTGACACATTTTTTAAATTTTTGTAGAGACAGTGTTTCAC 

CATGTTGCCCAGGCAGGTCTCAAACTCCTGGGCTCAAGTGGTCCTCCAGCTTCAGC 

CTCCCAAAGTGCTAGGATTATAGGTGTGAGCCACAGTGCCCAGCCCCGTAGTGGAG 

AATTTCTGTTGAATGAACCAAAAGCAACTGCCAACCTCTCCATGCACCATGTGTTT 

CAGAGGAGAAAGCACAGTGAAGAATGCAGTGTGTTCTGAGGTCCTGTCACCCCTGA 

GGCTGTGTGTGTCCTTTGCCAAATTAAAGAGTCTTACTGAATGCGGTGCATCCAGG 
AGACAGGCCNAGGTTTGGACTGGTAAAAAAAAA 

Sequence ID 13 99 

CAGACACCTGGNAGAACGGGAAGGAGACGCTGCAGCGCGCGGACCCCCCAAAGACA 

CATGTGACCCACCACCCCATCTNTGACCATGAGGCCACCCTGAGGTGCTGGGCCCT 

GGGCTTCTACCCTGCGGAGATCACACTGACCTGGCAGCGGGATGGCGAGGACCAAA 

CTCAGGACACCGAGCTTGTGGAGACCAGACCAGCAGGAGACAGAACCTTCCAGAAG 

TGGGCAGCTGTGGTGGTGCCTTCTGGAGAAGAGCAGAGATACACATGCCATGTACA 

GCATGAGGGGCTGCCGAAGCCCCTCACCCTGAGATGGGAGCCATCTTCCCAGTCCA 

CCGTCCCCATCGTGGGCATTGTTGCTGGCCTGGCTGTCCTAGCAGTTGTGGTCATC 

GGAGCTGTGGTCGCTGCTGTGATGTGTAGGAGGAAGAGTTCAGGTGGAAAAGGAGG 

GAGCTACTCTCAGGCTGCGTCCAGCGACAGTGCCCAGGGCTCTGATGTGTCTCTCA 

CAGCTTGAAAAGCCTGAGACAGCTGTNTTGTGAGGGACTGAGATGCAGGATTTCTT 

CACGCCTCCCCTTTGTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACC 

TGAATGTGTCTGCGTCCTTGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACC 

CTTGTGTCAACTGTGACCCCCTGTTCCCATGCTGACCTGTGTTTCCTCCCCAGTCA 

TCTTTTTTGTTCNCAATAGGTGGGGCCTGGATGTCTCCATCTCTGTNTCA 



Sequence ID 1440 

TTATAAGGTACTTTTAAGGTATTTTAGTTGTCTTAGTCTATATTTCTGTACTCACC 
TTTCTTTATCCACTCATCAGTTGATGGGCATGTAGGTTGGTTCCATATCTTTGCAA 
TTCTGAATTGTGCTGTGATCAGGTGTCTTTTTAGTATAATGATTTACTCTCCTTTG 
GGTAGATACCCAGTAGTGGGATTGCTGGATCGAATGGTTTTTATAATTTTCTATTT 
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TACCACAGTTTCTCTCTGCATTTTTCCTCTTTGACCACTAACCATGTGAAATTCTC 
ATATTGACCTTTATAATGATCATGAACTCTTAGTATCATTGGGAAGGCCACATTTG 
CCACTTATGATTGTAAACCTTATCCTCCATTTTTCCTGTTATTGTTGGTGCAAAAA 
GCACCTATTATACCAGGACTTTAAAAATCAGTCTGATAAGTCTTTGATAAGTCTAA 
5 TAATAATAACTGATAAGTCCATTGAATTTGCTTCTGATTACTTTTTCTTTAGTAGC 
TAAACATGTATGTACTCCTATGATTACAATGAACACTCCTCTCCATTTAAATTAAT 
TATTTACATTGATGAAATAGCAAAATGTTAATGACTAAATACTGTCTTGGTTTTTT 
CGTTCCAGGTCAGTCAATATTAACTTCTTATAATTTTCTTTTTTTTCTTT 

10 Sequence ID ^447 

GCAAGGACTAACCCCTATACCTTCTGCATAATGAATTAACTAGAAATAACTTTGCA 
AGGAGAGCCAAAGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAGCTAAA 
AGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGGCGACA 
AACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTT 

1 5 AAATTTGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCAAAG 
AGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTAA 
CACCCATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACAC 
CCACTACCTAAAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCA 
ATCTATCACCCTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCC 

20 TCCGCATAAGCCTGCGTCAGATTAAAACACTGAACTGACAATTAACAGCCCAATAT 
CTACAATCAACCAACAAGTCATTATTACCCTCACTGTCAACCCAACACAGGCATGC 
TCATAAGGAAAGGT 

Sequence ID 1448 

2 5 GGCCACCGGGTGCAAGGTCAGGGCTGGGGTGGAGGCTGGGAAGCCCAGGGCTTGGC 

CCACTGTGGCCGCCTTGTGTGGTCACTGCTTTCCTGGGCCTGCTGTGAGCTCCCTC 
TAGGACCCCAGGCCTGTCTGGTGGGTCACTGTGACCACCACCTTGCACAGCACCTG 
GCGCGTGGCAGGTGCTCAAACATTACTTGTTTCGGAATGAACTTCATCTTGCTCTT 
GGCTTTTTGACTAATGCTGTGGAACATCTGACTAATTAGTGACTCTTTGGGGCCCC 

3 0 CAGTTTCCCAGCTATAAAGTGGTAATATTAAGATAATAATTCGGCCGGGCGCGGTG 

GCTCACGCCTGTAATCCCAGCAGCACTTTGGGAGGCCGAGGTGGGCAGATCACGAG 
GTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCATCTCTACTAAAAA 
TACAAAAAATTANCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCANGAG 
GCTGANGCAGGAGAATGGTGTGAACCCGGGAGGCAGAGGTTGCAGTGAACCAAGAT 
3 5 CGNNCCACTGCACTCCAGCCTGGGCAACAGAGCGAGACTCCATCTTAAAAAA 

Sequence ID 1449 
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AATCAGC3GCCGCAGTGTGTTCTGCGCCTGCCCAGAGCTGACTCCTGATTTAACCGC 

TGGCGTAACCGCGGGTTGCACGCATGCGTGCTGAAAAGCCTTTCACCCTCACGTGG 

TTTCTTTTTTAACCAGTCATCAAGCGAGGCTCGCGCGCAGGCCCCGCGTTGGAAAA 

TGGCGGGGAAGCTGAAACCTCTGAATGTGGAGGCGCCAGAAGCTGCTGAGGAGGCT 

GAAGGTAGTGAGGGCAAGTGGGCTGCACTCCTTTCTCTCCAACCAGGGCAGAAAGG 

AGGGAGGATTCGTCCCATTACAATAATGAAATAATGATATTCTAATTTTTTTAAAT 
AAAATGTTAAGCCTTTTGTTATTGAA 

Sequence ID 1450 

GGAAANCATGAGGCTTCGGGAGCCGCTCCTGAGCGGCAGCGCCGCGATGCCAGGCG 

CGTCCCTACAGCGGGCCTGCCGCCTGCTCGTGGCCGTCTGCGCTCTGCACCTTGGC 

GTCACCCTCGTTTACTACCTGGCTGGCCGCGACCTGAGCCGCCTGCCCCAACTGGT 

CGGAGTCTCCACACCGCTGCAGGGCGGCTCGAACAGTGCCGCCGCCATCGGGCAGT 

CCTCCGGGGAGCTCCGGACCGGAGGGGCCCGGCCGCCGCCTCCTNTAGGCGCCTCC 

TCCCAGCCGCGCCCGGGTGGCGACTCCAGCCCAGTCGTGGATTCTGGCCCTGGCCC 

CGCTAGCAACTTGACCTCGGTCCCAGTGCCCCACACCACCGCACTGTCGCTGCCCG 

CCTGCCCTGAGGAGTCCCCGCTGCTTGGTAAGGACTCGGGTCGGCGCCAGTCGGAG 

GATTGGGACCCCCCCGGATTTCCCCGACAGGGTCCCCCANACATTCCCTCAGGCTG 

GCTCTTCTACGACAGCCAGCCTCCCTCTTCTGGATCAGAGTTTTAAATCCCANACA 

GAGGCTTGGGACTGGATGGGAGAGAAGGTTTGCGAGGTGGGTCCCTGGGGAGTCCT 

GTTGGAGGCGTGGGGCCGGGACCGCACAGGGAAGTCCCGAGGCCCCTCTAGCCCCA 

AAACCANAGAAGGCCTTGGAGACTTCCCTGCTGTGGCCCGAGGCTNAGGAAGTTTT 

GGAGTTTTGGGTCTGCTTANGGCTTCNAGCAGCCTTGCACTGAGAACTTTGGTAGG 

GACCTCGAGTAATCCACTCCNTTTTNGGGACTGACGTGAGGCTCCCGGTGGGGAAA 
GANACTGACCTNTC 

Sequence ID 1453 

CCGACCTGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTCTCTTTCTGGC 
CTGGAGGCTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATCCAGCAGA 
GAATGGAAAGTCAAATTTCCTGAATTGCTATGTGTCTGGGTTT.CATCCATCCGACA 
TTGAAGTTGACTTACTGAAGAATGGAGAGAGAATTGAAAAAGTGGAGCATTCAGAC 
TTGTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTACACTGAATTCACCCC 
CACTGAAAAAGATGAGTATGCCTGCCGTGTGAACCATGTGACTTTGTCACAGCCCA 
AGATAGTTAAGTGGGATCGAGACATGTAAGCAGCATCATGGAGGTTTGAAGATGCC 
GCATTTGGATTGGATGAATTCCAAATTCTGCTTGCTTGCTTTTTAATATTGATATG 
CTTATACACTTACACTTTATGCACAAAATGTAGGGTTATAATAATGTTAACATGGA 
CATGATCTTCTTTATAATTCTACTTTGAGTGCTGTCTCCATGTTTGATGTATCTGA 
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GCAGGTTGCTCCACAGGTAGCTCTAGGAGGGCTGGCAACTTAGAGGTGGGGAGCAG 
AGAATTCTCTTATCCAACATCAACATCTTGGTCAGATTTGAACTCTTCAATCTCTT 
GCACTCAAAGCTTGTTAAGATAGTTAAGCGTGCATAAGTTAACTTCCAATTTACAT 
ACTCTGCTTAGAATTTGGGGGAAAATTTAGAAATATAATTGACAGGATTATTGGAA 
5 ATTTGTTATAATGAATGAAACATTTTTGTCATATAAGATTCATATTTACTTCTTAT 
ACA 

Sequence ID 1454 

TAAATAGGGAATCCTTTCCCCATTGCTTGTTTTTCTCAGGTTTGTCAAAGATCAGA 
10 TAGTTGTAGATATGCGACGTTATTTCTGAGGGCTCTGTTCTGTTCCATTGATCTAT 
ATCTCTGTCACATGCACACGTATGTTTGTTGTGGCACTATTCACAGTGGCAAAGAC 
TTGGAACCAACCCAAATGTCCAACAATGATAGACCGGGTTAAGAAAATGCGGCACA 
TATACACCATGGAATACTATGTAGCCATAAAAAATGATGAGTTCGTGTCCTTTGTA 
GGGACATGGATGAAATTGGAAATCATCATTCTCAGTAAACTATCGCAGGAACAAAA 
1 5 AACCAAACACTGCATATTCTCACTCATAGGTGGGAATTGAACAGTGGGAACACATG 
GACACAGGAAGGGGAACATCACACTCTGAGGACTGTTGTGGGGTGGGGGGAGGGAG 
GAGGGATAGCATTGGGAGATATACCTAGTGCTGGATGACGAGTTAGTGGGTGCAGC 
GCACCAGCATGTCACATGTATACATATGTAACTAACCTGCACATTGTGCACATGTA 
CCCTAAAACTTAAGGTAT 

20 

Sequence ID 1456 

CCGCAACAAACACGGGAGTGCAGATATCGCTGCGATGGGCTGATTTCCTTTATTTG 
GGTATATACCCAGCAGTGGGATTGCTGGATTGTATGGTAGCTCTATTAGTTTTTTG 

2 5 AGGAACCTCCAAACTGTTCTNCATAGTGGTTGTACTCATTTACATTCCCACTGTGA 

ACCCTGAAAATTTGAGGCAGGTCTCAGTTAAATTAGAAAGTTGATTTTGCCAAGTT 
GGGGACACGCACTCGTGACACAGCCTCAGGAGGAACTGATGACATGTGCCCAGGTG 
GTCAGAGCACAGCTTGGTTTTATACATTTTAGGGAAACCTGAGCCATCAATCAACA 
TACGTAAAATGGGCCGGGCACAGCAGCTCAAGCTGTAATCCCAGCACTCTGGGAGG 

3 0 CCGAGGCGGGTGGATCACTTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATGG 

TGAAACCCCGTCTCTATTAAAAATACAAAGCTTAGCTGGATGTGGTGGCGCATGCC 
TGTAGTCCCAGCTGCTCTAGGAGGCTGAGGCATGAGAATTGCTTGAACCTGGGAGG 
CAGAGGCTGCAGTGAGCCGAGATCGAGCCACTATACTCCAGCCTGGTCAACAGAGT 
GAGACCCTGTCT 

35 

Sequence ID 1460 

CCACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTGACTCCTG 
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AGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGT 

GGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGA 

GTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGG 

CTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAAC 

CTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGA 

TCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACT 

TTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGT 

GTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATT 

TCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATG 

AAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTG 

Sequence ID 14 90 

ATGGGCATCTCTCGGGACAACTGGCACAAGCGCCGCAAAACCGGGGGCAAGAGAAA 

GCCCTACCACAAGAAGCGGAAGTATGAGTTGGGGCGCCCAGCTGCCAACACCAAGA 

TTGGCCCCCGCCGCATCCACACAGTCCGTGTGCGGGGAGGTAACAAGAAATACCGT 

GCCCTGAGGTTGGACGTGGGGAATTTCTCCTGGGGCTCANAGTGTTGTACTCGTAA 

AACAAGGATCATCGATGTTGTCTACAATGCATCTAATAACGAGCTGGTTCGTACCA 

AGACCCTGGTGAAGAATTGCATCGTGCTCATCGACAGCACACCGTACCGACAGTGG 

TACGAGTCCCACTATGCGCTGCCCCTGGGCCGCAAGAAGGGAGCCAAGCTGACTCC 

TGAGGAAGAAGAGATTTTAAACAAAAAACGATCTAAAAAAATTCAGAAGAAATATG 

ATGAAAGGAAAAAGAATGCCAAAATCAGCAGTCTCCTGGAGGAGCAGTTCCAGCAG 

GGCAAGCTTCTTGCGTGCATCGCTTCAAGGCCGGGACAGTGTGGCCGAGCAGATGG 

CTATGTGCTAGAGGGCAAAGAGTTGGAGTTCTATCTTAGGAAAATCAAGGCCCGCA 

AAGGCAAATAAATCCTTGTTTTGTCTTCACCCATGTAATAAAGGTGTTTATTGTTT 
TTGTT 

Sequence ID 1491 

CTTNCACATACTGATTGATGTCTCATGTCTCTCTAAAATGTGTAAAACCAAGCTGT 
GCCCCAACCACCTTGGGNACATGTGGNGAGGACCTCCTGAGGCTGTGTCATGGGCA 
CACCTTAACCCTGGGAAAATAAACTTTCTAAACTGACTTGAGAGCTGTCTCAGATA 
TTCTGAGCTTACAGTTATTGTGAAATCATTTTAATTATAAATTAAGTGGAGATTTA 
CTTAAAATCATGTGTAGAAGTAGCCTGTGATATAGTCCTAGATACATACATTATCA 
TCTTATGTATCTTCCCTCCCTCTTCCAGGTTCTGATAAAAACAGATGAAATCTGAA 
AGACCATGACAGTAGTATTTTGAAAATGACAGTATTTGAAATTAAAAAATTGTAAA 
AGTGTTCTGTTCTATCACTGCCAAAGGATAAGTTACAAATTGGTTCTTGGAACGTA 
ATATGTACTATGTGCTTGCTATTTAATAATTTACCAGTCTTAGTCTTTTTTATTCA 
GACTAATTTTACCTTTTTTTAACCTATGACTCTTTAGTTATAGTAGTACAAAAAAG 
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TAGTTTTAGTTATAGTTTTAGTTGTAGTACAAAAAAGCATTTTCTGTAAGCTTAAT 
TTCTTTCCCCTTCCCGCTTTCCCAGTCAGATGACTTTAGTGATTTGGAGTTGTGTG 
CTTTATAAGTGCATTCCTCAGAGGACTTAATATTACTAAGATTTTAGCAACNCTGA 
AATATGTT 

5 

Sequence ID 1492 

TGTNCCTGTAGTCCTGTGTGGGAGGATTGCCTGAGCCTAGGAGCTCAAAGTTGCAG 
TGAGCCCAGATCGNGNCATTGCAGTCCAGCCTGGGTGACAGAGTGAGACCCCATGT 
CAAAAAAAAAAAAACAAAAAACAGGGGCCTGCCTCANCCAGCAGGTGAGGTCTGCC 

1 0 ACTGAGAGC^CT^CTAGCAGCAGGAACAGCCTCCACCCCCACACTGGAATCAAGTT 
TTTTGGGTCAGCCTTAGGAGCTAANAAAGGGCCTAGTTTGNCTAAATAGCAGGAGT 
TATATCCAGGGATCTTCAGGCCCAGGAATGCTAATGAGTAGGCATTCCATGGGCCC 
TGGGAATGGCTTTGTGTGCCANAAATGATGGCCACAAAGGCCTTGCTGCCTTTTTT 
CAAAATGGCTGCATCCAGCTGAGTGCTCTCTGCCAAAGGGGANAANAAAATAAGTC 

15 TCCAGTGCATTTAGATTGGTCTCTCATCATCTCTCTCCTTTTTGTTTTTATTAGTC 
TCCTTAACC^AAACTGCCAAGAAAGGCTTGGAATTGAAACAAAACCTGATANAANA 
GGTAAGAGGTTGTTCTTTT 

Sequence ID 1493 
20 TGTNTCAAAAAAAAAAAAAAGAACGGNAATGTACTGGAGATGTATTTGATAACCAA 
GGNTTTAGGTAAATTTTCACCAGTATTAGTTNTATTTGCAAACTGAAAAATGTTGT 
AGGCTTAATATAAAATAACCACATTAGTGAACATTATATCTCTTAGAAGAAAGGCC 
ATATTTTGCTCCTGCTTCTGTAAAAATATTATTTGTTTGAAGGGGAAATAATGGTA 
GTGTGACCTTTCACTTAATTCCTACTCCCTTAATGTGAGAGAGACAAAATGAGCTG 

2 5 AAGAAGGAAAATTCTGGAGTTACACTCCACAACCTTGAACATACTGACGGACATCT 

CTGTTTTGACAACGATTTCTCCATGCCACCCATGCTNTAATGCCTTGTGGATCACG 
GACAACCCTCTTTGCACAAGCTACAGCATCAGCGATGTTATCTTGCAGCAAAGCAC 
TGCAGGATAAATGACAGGCATTAACTGCTCCTGGGGTTTTGCCATCATTACACCAG 
TAGCGGCTATTGATCTGAl^TATCCCATAATCAGTGCTTCTGTCTCCAGCATTGTA 

3 0 GTTTGTAGCTCGTGTGTTGTAACCACTCTCCCATTTGGCCAAACACATCCAGTTTG 

CTAGGCTGATTCCCCTGTAGCCATCCATTCCCAATCTTTTCAGAGTTCTGGCCAAC 
TCACACCTTTCAAA.GACCTTGCCCTGGACCGTAACAGAAAGGAGGACAAGCCCCAG 
AACAATGAGAGCCTTCATGTTGAC 

3 5 Sequence ID 1494 

TTGGTACCCGGGAAATTCTTTGCCGCGTCGACGGCCGGTGAGGCAGATCACCTGAG 
CCCAGGAGTTCAGGACCAGCCTGGGCAGCATACCGGGATTCCATCTl^NACTAAAAA 
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CAGTAGGCTGGGTGTGGTGGCTCATGTCTGTAAGCTCAGGACTTTGGAAGGCCAAG 

ATGGGAGGATCACTTGAGCCTGGGAGTTTGACACCAGCTTGAGCATCGTAGCCAGG 

CCCTGACTCTACAAAAAAGTGAAATAATTAGCCGAGTGTGGTGGTTCACACCTGTA 

ATCCCAGCTGCTCAGGAGGCTGAGGTAGGAGAATCATTTGAACCCGGGAGGTGGAG 

GTTGCAGTTAGCCGAGATCACGCCATTGCACTCCGGCCTGGGCGATAAAGCGAGAC 
TCTGTCTCAAAAAAAAAAAAAA 

Sequence ID 1495 

ATTCGGGCCGAGATGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTCTCT 

TTCTGGCCTGGAGGCTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATC 

CAGCAGAGAATGGAAAGTCAAATTTCCTGAATTGCTATGTGTCTGGGTTTCATCCA 

TCCGACATTGAAGTTGACTTACTGAAGAATGGAGAGAGAATTGAAAAAGTGGAGCA 

TTCAGACTTGTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTACACTGAAT 

TCACCCCCACTGAAAAAGATGAGTATGCCTGCCGTGTGAACCATGTGACTTTGTCA 

CAGCCCAAGATAGTTAAGTGGGATCGAGACATGTAAGCAGCATCATGGAGGTTTGA 

AGATGCCGCATTTGGATTGGATGAATTCCAAATTCTGCTTGCTTGCTTTTTAATAT 

TGATATGCTTATACACTTACACTTTATGCACAAAATGTAGGGTTATAPlTAATGTTA 

ACATGGACATGATCTTCTTTATAATTCTACTTTGAGTGCTGTCTCCATGTTTGATG 

TATCTGAGCAGGTTGCTCCACAGGTAGCTCTAGGAGGGCTGGCACCTTAGAGGTGG 

GGAGCAGAGAATTCTCTTATCCAACATCAACATCTTGGTCAGATTTGAACTCTT 

Sequence ID G6 

GGATTTTTGGTCCGCACGCTCCTGCTCCTGACTCACCGCTGTTCGCTCTCGCCGAG 

GAACAAGTCGGTCAGGAAGCCCGCGCGCAACAGCCATGGCTTTTAAGGATACCGGA 

AAAACACCCGTGGAGCCGGAGGTGGCAATTCACCGAATTCGAATCACCCTAACAAG 

CCGCAACGTAAAATCCTTGGAAAAGGTGTGTGCTGACTTGATAAGAGGCGCAAAAG 

AAAAGAATCTCAAAGTGAAAGGACCAGTTCGAATGCCTACCAAGACTTTGAGAATC 

ACTACAAGAAAAACTCCTTGTGGTGAAGGTTCTAAGACGTGGGATCGTTTCCAGAT 

GAGAATTCACAAGCGACTCATTGACTTGCACAGTCCTTCTGAGATTGTTAAGCAGA 

TTACTTCCATCAGTATTGAGCCAGGAGTTGAGGTGGAAGTCACCATTGCAGATGCT 

TAAGTCAACTATTTTAATAAATTGATGACCAGTTGTTAAAA 
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Sequence ID - 61 nt: 362 

GTTATTGAAAATTTTACTAATTTCTTACrrrrTAGGTTTTAGGAGAATACTTTTGGA 
TAATTGACTAGCCTCACAT^ATATTGATAGACKjTTCTTGAAAACTTTAATGCCAAT 
TCATGTATCITATGACTAAAATAGATAATCCATTTAGAAATTTAAGTCATrCrTGC 
GTGCTTGATATGTGTCAGCACTATCCAAGTTGCTAGGGGATACAATGGTGAAGTG 

aaaatatcagctaggtgccggtggctcacacctgttatcccaacagtttgggagg 

ccagggtgggaggatcactcaagcacangcgtttcacaccagcctggacaacat 

acaagaccccatctttaccaaaagttaag 



Sequence ID - 490 nt: 382 

ttttcttagaactitatttittctggccaggcgcagtggctcacacctgta^ 

agcactttgggaggccaaggcaggtcgatcacctgaggtcaggagctcaagacc 

agcctggccaacatqgtgaaaccctgtctctactaaaaatacaaaaattagctgg 

gcgtggtggcgcatgcctgtaatcccanctactcaggaggctgaggcaggagaa 

ttgtttgaacccgggaggcggaggttgcantgagccgagattgcgccactgcact 

ccagcctgggcaacagagcgaaactccatctcaaaaaaaaaaaaaaaaaacaac 

ctttattttttctgattttaaaagtaataactagtttgtagaaacattaaaagt 

Sequence ID - 892 nt: 559 

tctttcggaagcgcgccttgtgttggtacccgggaattcgcggccgcgtcgacgc 

ggtcgtaagggcrgaggatttttggtccgcacgctcctgctcctgactcaccgct 

gttcgctctcgccgaggaacaagtcggtcaggaagcccgcgcgcaacagccatg 

gcttttaaggataccggaaaaacacccgtggagccggaggtggcaattcaccga 

attcgaatcaccc'raacaagccgcaacgtaaaatccttggaaaaggtgtgtgctg 

acttgataagaggcgcaaaagaaaagaatctcaaagtgaaaggaccagttcgaa 

tgcctaccaagactttgagaatcactacaagaaaaactccttgtggtgaaggttc 

taagacgtgggatcgtttccagatgagaattcacaagcgactcattgacttgcac 

agtccitctgagattgttaagcagattacttccatcagtattgagccaggagttg 

aggtggaagtcaccattgcagatgcttaagtcaactattttaataaattgatgac 

cagttgttt 



Sequence ID - 77 nt 464 

GCGGCrTGCTGTTGGTTGGGGGCCGTCCCGCTCCTAAGGCAGGAAGATGGTGGCCG 

CAAAGAAGACGAAAAAGTCGCTGGAGTCGATCAACTCTAGGCTCCAACTCGTTAT 

GAAAAGTGGGAAGTACGTCCTGGGGTACAAGCAGACTCTGAAGATGATCAGACA 

AGGCAAAGCGAAATTGGTCATTCTCGCTAACAACTGCCCAGCTTTGAGGAAATCT 

GAAATAGAGTACTATGCTATGTTGGCTAAAACTGGTGTCCATCACTACAGTGGCA 

ATAATATTGAACTGGGCACAGCATGCGGAAAATACTACAGAGTGTGCACACTGG 

CTATCATTGATCCAGGTGACTCTGACATCATTAGAAGCATGCCAGAACAGACTGG 

TGAAAAGTAAACCTTTTCACCTACAAAATTTCACCTOCAAACCTTAAACCTGCAA 

AATTTTCCTTTAATAAAATTTGCTTG 
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glaijyig: . 

1. A set of oligonucleotide probes, wherein said set 
comprises at least 10 oligonucleotides selected from: 
5 an oligonucleotide as described in Table 1 or derived 
from a sequence described in Table 1 # or an 
oligonucleotide with a complementary sequence, or a 
functionally equivalent oligonucleotide. 

10 2. A set of oligonucleotide probes as claimed in claim 
1 wherein said/ oligonucleotide probes are selected from: 
an oligonucleotide as described in Table 2 or derived 
from a sequence described in Table 2, or an 
oligonucleotide with a complementary sequence, or a 

15 functionally equivalent oligonucleotide. 

3 . A set of oligonucleotide probes as claimed in claim 
1 wherein said oligonucleotide probes are selected from: 
an oligonucleotide as described in Table 4 or derived 

2 0 from a s&quenc^ described in Table 4, or an 

oligonucleotide with a complementary sequence, or a 
functionally equivalent oligonucleotide. 

4 . A set of oligonucleotide probes as claimed in any 
25 one of claims 1 to 3, wherein each probe in said set 

binds to a different transcript. 

5 . A set as claimed in any one of claims 1 to 4 
consisting of from 10 to 500 oligonucleotide probes. 

6. An oligonucleotide probe wherein said probe is 
selected from the oligonucleotides listed in Table 1, or 
derived from a sequence described in Table 1, or a 
complementary sequence thereof. 

35 

7 . A set of oligonucleotide probes as claimed in any 
one of claims 1 to 5 , or an oligonucleotide probe as 
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claimed in claim 6, wherein each of said oligonucleotide 
probes is from 15 to 200 bases in length. 

8 • A set of oligonucleotide probes as claimed in any 
5 one of claims 1 to 5 or 7 or an oligonucleotide probe as 
claimed in claim 6 or 7, wherein the transcript to which 
said probe binds is derived from a gene which is 
constitutively moderately or highly expressed. 

10 9. A set of' oligonucleotide probes as claimed in any 

one of claims 1 to 5 , 7 or 8 or an oligonucleotide probe 
as claimed in any one of claims 6 to 8 , wherein said 
probes are immobilized on one or more solid supports. 

15 10. A set of oligonucleotide probes or an 

oligonucleotide probe as claimed in claim 9, wherein 
said solid support is a sheet , filter, membrane, plate 
or biochip. 

20 11. A polypeptide encoded by the mRNA sequence to 

which an oligonucleotide as defined in claim 6 binds. 

12 . An antibody to a polypeptide as defined in claim 
11 . 

25 

13 . A kit comprising a set of oligonucleotide probes 
immobilized on one or more solid supports as defined in 
claim 9 or 10. 

3 0 14. A kit as claimed in claim 13 wherein said probes 
are immobilized on a single solid support and each 
unique probe is attached to different region of said 
solid support. 

35 15. A kit as claimed in claim 13 or 14 further 
comprising standardizing materials. 
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The use of a set of probes as described in any one 
Pf claims 1 to 5 or 7 to 10 or a kit as described in any 
one of claims 13 to 15 to determine the gene expression 
pattern of a cell which pattern reflects the level of 
gene expression of genes to which said oligonucleotide 
probes bind, comprising at least the steps of: 

a) isolating mRNA from said cell, which may 
optionally be reverse transcribed to cDNA; 

b) hybridizing the mRNA or cDNA of step (a) to a 
set of oligonucleotides or a kit as defined in any one 
of claims 1 to' 5, 7 to 10 or 13 to 15; and 

c) assessing the amount of mRNA or cDNA hybridizing 
to each of said probes to produce said pattern. 

15 17. A method of preparing a standard gene transcript 
pattern characteristic of a disease or condition or 
stage thereof in an organism comprising at least the 
steps of: 

a) isolating mRNA from the cells of a sample of one 
or more organisms, having the disease or condition or 
stage thereof, which may optionally be reverse 
transcribed to cDNA; 

b) hybridizing the mRNA or cDNA of step (a) to a 
set of oligonucleotides or a kit as defined in any one 

25 of claims 1 to 5, 7 to 10 or 13 to 15 specific for said 
disease or condition or stage thereof in an organism and 
sample thereof corresponding to the organism and sample 
thereof under investigation; and 

c) assessing the amount of mRNA or cDNA hybridizing 
to each of said probes to produce a characteristic 
pattern reflecting the level of gene expression of genes 
to which said oligonucleotides bind, in the sample with 
the disease, condition or stage thereof. 



18. A method of preparing a test gene transcript 
pattern comprising at least the steps of: 

a) isolating mRNA from the cells of a sample of 
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said test organism, which may optionally be reverse 
transcribed to cDNA; 

b) hybridizing the mRNA or cDNA of step (a) to a 
set of oligonucleotides or a kit as defined in any one 

5 of claims 1 to 5, 7 to 10 or 13 to 15 specific for a 

disease or condition or stage thereof in an organism and 
sample thereof corresponding to the organism and sample 
thereof under investigation; and 

c) assessing the amount of mRNA or cDNA hybridizing 
10 to each of said probes to produce said pattern 

reflecting the level of gene expression of genes to 
which said oligonucleotides bind, in said test sample . 

19. A method of diagnosing or identifying or monitoring 
15 a disease or condition or stage thereof in an organism, 
comprising the steps of: 

a) isolating mRNA from the cells of a sample of 

said organism, which may optionally be reverse 
transcribed to cDNA; 
20 b) hybridizing the mRNA or cDNA of step (a) to a 

set of oligonucleotides or a kit as defined in 
any one of claims 1 to 5, 7 to 10 or 13 to 15 
specific for said disease or condition thereof 
in an organism and sample thereof 
25 corresponding to the organism and sample 

thereof under investigation ; 

c) assessing the amount of mRNA or cDNA 
hybridizing to each of said probes to produce 
a characteristic pattern reflecting the level 

30 of gene expression of genes to which said 

oligonucleotides bind in said sample; and 

d) comparing said pattern to a standard 
diagnostic pattern prepared as described in 
claim 17 using a sample from an organism 

35 corresponding to the organism and sample under 

investigation to determine the degree of 
correlation indicative of the presence of said 



WO 2004/046382 



PCT/GB2003/005102 



- 284 



10 



20 



30 



disease or condition or a stage thereof in the 
organism under investigation. 



20 . 



A method as claimed in any one of claims 17 to 19 
wherein said mRNA or cDNA is amplified prior to step b) . 

21. A method as claimed in any one of claims 17 to 2 0 
wherein the oligonucleotides and/or the mRNA or cDNA are 
labelled. 

f 

22. A method .as claimed in any one of claims 17 to 21 
wherein said probes are as defined in claim 3 and said 
disease is Alzheimer's disease. 

15 23. A method as claimed in any one of claims 17 to 21 
wherein said probes are as defined in claim 2 and said 
disease is breast cancer. 



24. A method as defined in any one of claims 17 to 23, 
wherein said set of oligonucleotides as defined in any' 
one of claims 1 to 5, 7 to 10 or 13 to 15 are replaced 
with a set of oligonucleotides which are randomly 
selected, preferably from a cDNA library. 

25 25. A method of preparing a standard gene transcript 
pattern characteristic of a disease or condition or 
stage thereof in an organism comprising at least the 
steps of : 

a) releasing target polypeptides from a sample of 
one or more organisms having the disease or. condition or 
stage thereof; 

b) contacting said target polypeptides with one or 
more binding partners, wherein each binding partner is 
specific to a marker polypeptide (or a fragment thereof) 

35 encoded by the gene to which an oligonucleotide of Table 
1 (or derived from a sequence described in Table 1) 
binds, to allow binding of said binding partners to said 
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target polypeptides, wherein said marker polypeptides 
are specific for said disease or condition thereof in an 
organism and sample thereof corresponding to the 
organism and sample thereof under investigation; and 
5 c) assessing the target polypeptide binding to said 

binding partners to produce a characteristic pattern 
reflecting the level of gene expression of genes which 
express said marker polypeptides, in the sample with the 
disease, condition or stage thereof. 

10 

26. A method of preparing a test gene transcript 
pattern comprising at least the steps of: 

a) releasing target polypeptides from a sample of 
said test organism; 
15 t>) contacting said target polypeptides with one or 

more binding partners, wherein each binding partner is 
specific to a marker polypeptide (or a fragment thereof) 
encoded by the gene to which an oligonucleotide of Table 
1 (or derived from a sequence described in Table 1) 

2 0 binds, to allow binding of said binding partners to said 

target polypeptides, wherein said marker polypeptides 
are specific for said disease or condition thereof in an 
organism and sample thereof corresponding to the 
organism and sample thereof under investigation; and 
25 c) assessing the target polypeptide binding to said 

binding partners to produce a characteristic pattern 
reflecting the level of gene expression of genes which 
express said marker polypeptides, in said test sample, 

30 27. A method of diagnosing or identifying or 

monitoring a disease or condition or stage thereof in an 
organism comprising the steps of : 

a) releasing target polypeptides from a sample of 
said organism ; 

3 5 b) contacting said target polypeptides with one or 

more binding partners, wherein each binding partner is 
specific to a marker polypeptide (or a fragment thereof) 
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encoded by the gene to which an oligonucleotide of Tab! 
1 (or derived from a seguence described in 
bmds to ailow binding of said binding partners to said 
target polypeptides, wherein said » rter polypeptides 
S are specific for said disease or condition thereof" „ 
organism and sample thereof corresponding to the 
organ^m and sample thereof under investigation; and 

c) assessing the target polypeptide binding to said 
brndrng partners to produce a characteristic pattern 

10 reflecting the level of gene expression of genes wnlch 
express said marker polypeptides in said sample; and 

d) comparing said pattern to a standard diagnostic 

from an organism corresponding to the organism and 
^ sample under investigation to determine the degree of 
correlation indicative of the presence of said disease 
or condition or a stage thereof in the organism under 
investigation. 

20 ; 28 • A meth ° d aS Claimed in any one of claims 17 to 27 
wherexn said pattern is expressed as an array of numbers 
relatxng to the expression level associated with each 
probe . 



-5 29 a method as claimed in any one of claims 17 to 28 
wherexn said organism is a eukaryotic organism, 
preferably a mammal. 



0 



30. A roet hod as claimed in claim 29 wherein said 
organxsm is a human. 

31. A method as claimed in any one of claims 17 to 3 0 
wherein the data making up said pattern is 
mathematically projected onto a classification model. 

32. A method as claimed in any one of claims 17 to 31 
wherein said disease is cancer or a degenerative brain 
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disorder . 



33. A method as claimed in any one of claims 17 to 32 
wherein said sample is tissue, body fluid or body waste. 

34. A method as claimed in any one of claims 17 to 33 
wherein said sample is peripheral blood. 

35. A method as claimed in any one of claims 17 to 34 
wherein the cells in the sample are not disease cells, 
have not been in contact with such cells and do not 
originate from the site of the disease or condition." 

36. A method as claimed in any one of claims 19 to 35 
for the diagnosis, identification or monitoring of two 
or more diseases, conditions or stages thereof in an 
organism, wherein said pattern produced in step c) is 
compared to at least two standard diagnostic patterns 
prepared as described in claim 17 or 25, wherein each 
standard diagnostic pattern is a pattern generated for a 
different disease or condition or stage thereof. 

37. A method of identifying probes useful for 
diagnosing or identifying or monitoring a disease or 
condition or stage thereof in an organism, comprising 
the steps of : 

a) immobilizing a set of oligonucleotide probes, 
preferably as described hereinbefore, on a 
solid support ; 

b) isolating mRNA from a sample of a normal 
organism (normal sample) , which may optionally 
be reverse transcribed to cDNA; 

c) isolating mRNA from a sample from an organism, 
corresponding to the sample and organism of 
step (b) , which is known to have said disease 
or condition or a stage thereof (diseased 
sample) , which may optionally be reverse 
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transcribed to cDNA; 

hybridizing the mRNA or cDNA of steps (b) and 
(c) to said set of immobilized oligonucleotide 
probes of step (a) ; and 
e) assessing the amount of mRNA or cDNA 

hybridizing to each of said oligonucleotide 
probes to determine the level of gene 
expression of genes to which said 
oligonucleotide probes bind in said normal and 
diseased samples to generate a gene expression 
data/ set for each sample ; 

normalizing and standardizing said data set of 
step (e) ; 

constructing a calibration model for 
classification, preferably using the 
statistical techniques Partial Least Squares 
Discriminant Analysis (PLS-DA) and Linear 
Discriminant Analysis (LDA) ; 
h) performing JackKnife analysis and identifying 
those oligonucleotide probes which are 
required for classification of said disease 
and normal samples into their respective 
groups . 



f) 

g) 
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Effect of Direct Standardization (DS) on the Alzheimer 
Data measured in two different series of experiments 
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